Re: ZFS MFC heads down
Kirk Strauser wrote: So far so good here (amd64, Core2 Duo, ICH9 SATA) but I'm too chicken to upgrade the on-disk format yet. Me too, upgraded pool to v13 yesterday and everything still ok. Removed also all loader.conf tunables. Many thanks for FreeBSD team. (Tyan Tank GT20, 2GB memory, ICH7, amd64) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS MFC heads down
On Fri, May 22, 2009 at 11:45 AM, Pertti Kosunen wrote: > Me too, upgraded pool to v13 yesterday and everything still ok. Removed also > all loader.conf tunables. Many thanks for FreeBSD team. what about i386? does it still need tunables? -- Alberto Villa ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
[releng_7 tinderbox] failure on amd64/amd64
TB --- 2009-05-22 11:45:27 - tinderbox 2.6 running on freebsd-stable.sentex.ca TB --- 2009-05-22 11:45:27 - starting RELENG_7 tinderbox run for amd64/amd64 TB --- 2009-05-22 11:45:27 - cleaning the object tree TB --- 2009-05-22 11:45:48 - cvsupping the source tree TB --- 2009-05-22 11:45:48 - /usr/bin/csup -z -r 3 -g -L 1 -h localhost -s /tinderbox/RELENG_7/amd64/amd64/supfile TB --- 2009-05-22 11:45:57 - building world TB --- 2009-05-22 11:45:57 - MAKEOBJDIRPREFIX=/obj TB --- 2009-05-22 11:45:57 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2009-05-22 11:45:57 - TARGET=amd64 TB --- 2009-05-22 11:45:57 - TARGET_ARCH=amd64 TB --- 2009-05-22 11:45:57 - TZ=UTC TB --- 2009-05-22 11:45:57 - __MAKE_CONF=/dev/null TB --- 2009-05-22 11:45:57 - cd /src TB --- 2009-05-22 11:45:57 - /usr/bin/make -B buildworld >>> World build started on Fri May 22 11:45:59 UTC 2009 >>> Rebuilding the temporary build tree >>> stage 1.1: legacy release compatibility shims >>> stage 1.2: bootstrap tools >>> stage 2.1: cleaning up the object tree >>> stage 2.2: rebuilding the object tree >>> stage 2.3: build tools >>> stage 3: cross tools >>> stage 4.1: building includes >>> stage 4.2: building libraries >>> stage 4.3: make dependencies >>> stage 4.4: building everything >>> stage 5.1: building 32 bit shim libraries >>> World build completed on Fri May 22 13:19:43 UTC 2009 TB --- 2009-05-22 13:19:43 - generating LINT kernel config TB --- 2009-05-22 13:19:43 - cd /src/sys/amd64/conf TB --- 2009-05-22 13:19:43 - /usr/bin/make -B LINT TB --- 2009-05-22 13:19:43 - building LINT kernel TB --- 2009-05-22 13:19:43 - MAKEOBJDIRPREFIX=/obj TB --- 2009-05-22 13:19:43 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2009-05-22 13:19:43 - TARGET=amd64 TB --- 2009-05-22 13:19:43 - TARGET_ARCH=amd64 TB --- 2009-05-22 13:19:43 - TZ=UTC TB --- 2009-05-22 13:19:43 - __MAKE_CONF=/dev/null TB --- 2009-05-22 13:19:43 - cd /src TB --- 2009-05-22 13:19:43 - /usr/bin/make -B buildkernel KERNCONF=LINT >>> Kernel build for LINT started on Fri May 22 13:19:43 UTC 2009 >>> stage 1: configuring the kernel [...] WARNING: kernel contains GPL contaminated emu10k1 headers WARNING: kernel contains GPL contaminated emu10kx headers WARNING: kernel contains GPL contaminated emu10kx headers WARNING: kernel contains GPL contaminated emu10kx headers WARNING: kernel contains GPL contaminated maestro3 headers WARNING: kernel contains GPL contaminated ext2fs filesystem WARNING: kernel contains GPL contaminated ReiserFS filesystem WARNING: kernel contains GPL contaminated xfs filesystem *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2009-05-22 13:19:44 - WARNING: /usr/bin/make returned exit code 1 TB --- 2009-05-22 13:19:44 - ERROR: failed to build lint kernel TB --- 2009-05-22 13:19:44 - 4543.37 user 512.22 system 5656.57 real http://tinderbox.des.no/tinderbox-releng_7-RELENG_7-amd64-amd64.full ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)
Hi Kip, I seriously don't understand what has happened. If I boot kernel.old I still get the same problem. Very confusing. :(. Joe on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser wrote: Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)
Motin is your best bet in tracking down ATA problems. Cheers, Kip On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: > Hi Kip, > > I seriously don't understand what has happened. If I boot kernel.old I still > get the same problem. Very confusing. :(. > > Joe > > on 21/05/2009 19:28 Kip Macy said the following: >> >> I have no idea what is happening. I think our best bet is having >> someone with insight into ATA provide us with help in adding >> diagnostics. >> >> Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. >> >> Cheers, >> Kip >> >> >> On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser wrote: >>> >>> Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 >>> server, which now doesn't boot. :(. >>> >>> So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 >>> disks >>> (gmirror on 500Mb partition on each of five disks, and zraid2 over the >>> rest >>> of each drive). >>> >>> What I did was to update the userland, and then reboot. I didn't upgrade >>> the >>> kernel (but I've subsequently done that and have the same problem). >>> >>> What happens is that the kernel hangs booting just after displaying a >>> LABEL >>> message or ZFS pool/spool message. I _can_ get it to boot if I boot >>> single >>> user with acpi switched off. When I do that I can manually start zfs, and >>> mount all the partitions. However, one of the disks is missing more >>> on >>> that next. >>> >>> The machine is running a gigabyte motherboard (domestic gamer P35 board, >>> similar to this >>> >>> http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, >>> although it might be a DS4 variant). I've got 5 of the 6 sata ports >>> wired >>> to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" >>> bays >>> kind of thing). >>> >>> Now, because of the gmirror I can boot the system on any disk, or >>> combination of plugged in disks. I should be able to succeed with the >>> kernel probe up to the attempt to mount the root filesystem irrespective >>> of >>> any zfs pool, etc. And, indeed, this has been working fine for about two >>> years. >>> >>> But, now it hangs in the same place no matter what disk I boot on (I've >>> tried every bay). >>> >>> But, without ACPI enabled it does appear to boot ok... what's going on >>> here? >>> Is it possible that the machine has developed a hardware fault? >>> >>> Ok, finally, if I boot with ACPI disabled then one of the disks is >>> missing. >>> If I unplug it I get a disconnect message from the ata device, and a >>> reconnect and reinit attempt when I plug it back in, but no device >>> appears >>> on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; >>> atacontrol >>> attach sata4' and the device reappears. This happens on the other buses, >>> but >>> not on the last one. It's not the disk, because if I swap it into another >>> bay, it comes up and appears on the bus. On the other hand it doesn't >>> appear >>> to be that controller or slow in the drive bay because if I unplug all >>> the >>> over disks the system will boot that disk and get as far as the hang >>> hmm. >>> >>> Is this a consequence of disabling the ACPI? >>> >>> Does anyone have a clue what might be going on? >>> >>> Joe >>> ___ >>> freebsd-stable@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" >>> >> >> >> > > -- When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
Hi Alexander, I've love it if you were able to provide some insight into this problem. I'm going to try switching sata cables around next to see if the problem goes away if I disconnect some combination of bays. Thanks, Joe on 22/05/2009 19:39 Kip Macy said the following: Motin is your best bet in tracking down ATA problems. Cheers, Kip On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: Hi Kip, I seriously don't understand what has happened. If I boot kernel.old I still get the same problem. Very confusing. :(. Joe on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserwrote: Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: net.inet.tcp.tso=1 still neceesary with fxp was Re: TCP differences in 7.2 vs 7.1
On Thu, 21 May 2009, Pyun YongHyeon wrote: On Wed, May 20, 2009 at 05:55:29PM -0400, Michael L. Squires wrote: I started having speed problems after shifting from 7.1-STABLE to 7.1-PRERELEASE. They have continued with 7.2-STABLLE. Reverting to the 7.1-STABLE kernel eliminated the problem. After downloading 7.2-STABLE from cvsup.freebsd.org at about 10:40 AM EST on 5/20/2009, doing a buildworld/buildkernel/installkernel/installworld cycle I still need to execute "net.inet.tcp.tso=1" to elminate throughput problems between my home system (on Comcast) and my office PC (connected via a Time-Warner connection). This also affects connections to other systems; downloading Web pages (ebay.com) speeds up after I change the TSO entry. The box in question runs NAT and has an fxp (Intel Pro100) interface connected to a Comcast cable modem and an em (Intel Pro1000) interface connected to the internal network. There are no network errors in "netstat -i" on either interface. The "if_fxp.c" code appears to be the May 7 version. You should have cvs rev. 1.266.2.15 of if_fxp.c. This is the dmesg entry for the card in question. The system is a dual Xeon Supermicro 1U box, 1GB RAM, single 300GB IDE hard drive. fxp0: port 0xe400-0xe43f mem 0xfebfd000-0xfebfdfff,0xfeb8-0xfeb9 irq 27 at device 7.0 on pci0 miibus0: on fxp0 Since you use both em(4) and fxp(4) I'd like to know which driver has the issue. Instead of disabling TSO of network stack try disabling TSO for each interface. For instance, 1. Diable TSO of em(4) and check you see the same issue (ifconfig em0 -tso). 2. Diable TSO of fxp(4) and check you see the same issue (ifconfig fxp0 -tso). The version of if_fpx.c is in fact 1.266.2.15. Connecting to the FreeBSD box from a PC with a bash shell under XP SP3/Cygwin OpenSSH I find (1) disable "tso" on the internal "em0" interface has no effect; but (2) disabling "tso" on the external "fxp0" inteface eliminates the througput problem. The effect appears to be the same as using sysctl to disable tso on all interfaces. With "tso" enabled on the "fxp0" interface the connection (reading email using "pine" in a large window) hung completely. There are no errors in "netstat -i" nor in /var/log/messages. "netstat -e" on the XP PC shows no discards or errors; however, I don't think I've ever seen a PC under Windows admit to network errors. The fxp0 interface connects to a Comcast cable modem, which eventually connects to my office PC which is in the "iga.in.gov" domain hosted by TimeWarner. I'll be happy to run anything else you want. Mike Squires UN*X at home Since 1985 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
devd panic on i386 7.2 Release with CARP
I am having a problem with one of my freebsd 7.2R boxes panicing on start of devd after upgrading to 7.2R. It is an old DELL 2400 dual processor. This is a build from completely refreshed sources. - generic kernel does not panic (built by me) - custom kernel does not panic with devd_enable="NO" set in rc.conf, but !!! __ I can start devd AFTER booting by hand at the command prompt! - custom kernel (carp and more memory ) does panic if devd is started automatically by rc.d scripts (the default behaviour). Do I really need devd for anything if I am not using USB? Anyone have any idea of how to fix this? My kernel config is pretty simple, I am building a test i386 box with a carp kernel to try and repro this on another box, but that box is really slow. After booting I just run kes# devd devd: Setting hw.bus.devctl_disable to 0 kes# And it does NOT panic Weird huh? kernel config folled by back trace # # SMP -- Generic kernel configuration file for FreeBSD/i386 SMP #Use this for multi-processor machines # # $FreeBSD: src/sys/i386/conf/SMP,v 1.5.6.1 2005/09/18 03:37:58 scottl Exp $ include GENERIC ident WHI # To make an SMP kernel, the next line is needed #optionsSMP # Symmetric MultiProcessor Kernel #options ASR_COMPAT options MAXDSIZ="(1536*1024*1024)" options MAXSSIZ="(512*1024*1024)" options DFLDSIZ="(1536*1024*1024)" device carp kes# kgdb back trace of core here 118> <118># <118>Loading configuration files. <118>kernel dumps on /dev/da0s1b <118>Entropy harvesting: <118> interrupts <118> ethernet <118> point_to_point <118> kickstart <118>. <118>swapon: adding /dev/da0s1b as swap device <118>Fast boot: skipping disk checks. GEOM_LABEL: Label for provider da0s1f is ufsid/3ec3641041d090a9. <118>Setting hostuuid: 44454c4c-5a9b-1059-8057-b8c04f303031. <118>Setting hostid: 0xd1c205d3. <118>Mounting local file systems: GEOM_LABEL: Label ufsid/3ec3641041d090a9 removed. WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted <118>. <118>Setting hostname: kes.icarz.com. <118>net.inet6.ip6.auto_linklocal: <118>1 <118> -> <118>0 <118> <118>kern.maxfilesperproc: <118>11095 <118> -> <118>19000 <118> <118>kern.maxfiles: <118>12328 <118> -> <118>2 <118> <118>lo0: flags=8049 metric 0 mtu 16384 <118> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 <118> inet6 ::1 prefixlen 128 <118> inet 127.0.0.1 netmask 0xff00 <118>fxp0: flags=8843 metric 0 mtu 1500 <118> options=2009 <118> ether 00:b0:d0:3e:c7:19 <118> inet 207.99.22.32 netmask 0xff80 broadcast 207.99.22.127 <118> media: Ethernet autoselect (100baseTX ) <118> status: active <118>add net default: gateway 207.99.22.1 <118>Additional routing options: <118>. <118>Starting devd. Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 00 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0874488 stack pointer = 0x28:0xf7bd0b68 frame pointer = 0x28:0xf7bd0b68 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 388 (devd) trap number = 12 panic: page fault cpuid = 1 Uptime: 2m12s Physical memory: 2035 MB Dumping 68 MB: 53 37 21 5 Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done. done. Loaded symbols for /boot/kernel/acpi.ko #0 doadump () at pcpu.h:196 196 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) backtrace #0 doadump () at pcpu.h:196 #1 0xc07e2a07 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xc07e2cd9 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0xc0ae895c in trap_fatal (frame=0xf7bd0b28, eva=0) at /usr/src/sys/i386/i386/trap.c:939 #4 0xc0ae8be0 in trap_pfault (frame=0xf7bd0b28, usermode=0, eva=0) at /usr/src/sys/i386/i386/trap.c:852 #5 0xc0ae958c in trap (frame=0xf7bd0b28) at /usr/src/sys/i386/i386/trap.c:530 #6 0xc0acdc9b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 #7 0xc0874488 in strlen (str=0x0) at /usr/src/sys/libkern/strlen.c:41 #8 0xc080a46c in devread (dev=0xc548b900, uio=0xf7bd0c60, ioflag=0) at /usr/src/sys/kern/subr_bus.c:458 #9 0xc07a6039 in giant_read (dev=0xc548b900, uio=0xf7bd0c60, ioflag=0) at /usr/src/sys/kern/kern_conf.c:414 #10 0xc076cecd in devfs_read_f (fp=0xc58ba260, uio=0xf7bd0c60, cred=0xc5470300, flags=0, td=0xc56288c0) at /usr/src/sys/fs/devfs/devfs_vnops.c:1007 #11 0xc081be86 in dofileread (td=0xc56288c0, fd=3, fp=0xc58ba260,
Re: devd panic on i386 7.2 Release with CARP
On Fri, May 22, 2009 at 03:26:51PM -0400, Ken Menzel wrote: > > I am having a problem with one of my freebsd 7.2R boxes panicing on > start of devd after upgrading to 7.2R. It is an old DELL 2400 dual > processor. This is a build from completely refreshed sources. > > - generic kernel does not panic (built by me) > - custom kernel does not panic with devd_enable="NO" set in rc.conf, but > !!! __ I can start devd AFTER booting by hand at the command prompt! > > - custom kernel (carp and more memory ) does panic if devd is started > automatically by rc.d scripts (the default behaviour). > > Do I really need devd for anything if I am not using USB? Anyone have > any idea of how to fix this? > > My kernel config is pretty simple, I am building a test i386 box with a > carp kernel to try and repro this on another box, but that box is really > slow. > > After booting I just run > kes# devd > devd: Setting hw.bus.devctl_disable to 0 > kes# ... > <118>lo0: flags=8049 metric 0 mtu 16384 > <118> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 > <118> inet6 ::1 prefixlen 128 > <118> inet 127.0.0.1 netmask 0xff00 > <118>fxp0: flags=8843 metric 0 > mtu 1500 > <118> options=2009 > <118> ether 00:b0:d0:3e:c7:19 > <118> inet 207.99.22.32 netmask 0xff80 broadcast 207.99.22.127 > <118> media: Ethernet autoselect (100baseTX ) > <118> status: active > <118>add net default: gateway 207.99.22.1 > <118>Additional routing options: > <118>. > <118>Starting devd. > > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 00 > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc0874488 > stack pointer = 0x28:0xf7bd0b68 > frame pointer = 0x28:0xf7bd0b68 > code segment= base 0x0, limit 0xf, type 0x1b >= DPL 0, pres 1, def32 1, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 388 (devd) > trap number = 12 > panic: page fault > cpuid = 1 > Uptime: 2m12s > Physical memory: 2035 MB > Dumping 68 MB: 53 37 21 5 > > Reading symbols from /boot/kernel/acpi.ko...Reading symbols from > /boot/kernel/acpi.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/acpi.ko > #0 doadump () at pcpu.h:196 > 196 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); > (kgdb) backtrace > #0 doadump () at pcpu.h:196 > #1 0xc07e2a07 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 > #2 0xc07e2cd9 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:574 > #3 0xc0ae895c in trap_fatal (frame=0xf7bd0b28, eva=0) >at /usr/src/sys/i386/i386/trap.c:939 > #4 0xc0ae8be0 in trap_pfault (frame=0xf7bd0b28, usermode=0, eva=0) >at /usr/src/sys/i386/i386/trap.c:852 > #5 0xc0ae958c in trap (frame=0xf7bd0b28) at > /usr/src/sys/i386/i386/trap.c:530 > #6 0xc0acdc9b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 > #7 0xc0874488 in strlen (str=0x0) at /usr/src/sys/libkern/strlen.c:41 > #8 0xc080a46c in devread (dev=0xc548b900, uio=0xf7bd0c60, ioflag=0) >at /usr/src/sys/kern/subr_bus.c:458 > #9 0xc07a6039 in giant_read (dev=0xc548b900, uio=0xf7bd0c60, ioflag=0) >at /usr/src/sys/kern/kern_conf.c:414 > #10 0xc076cecd in devfs_read_f (fp=0xc58ba260, uio=0xf7bd0c60, >cred=0xc5470300, flags=0, td=0xc56288c0) >at /usr/src/sys/fs/devfs/devfs_vnops.c:1007 > #11 0xc081be86 in dofileread (td=0xc56288c0, fd=3, fp=0xc58ba260, >auio=0xf7bd0c60, offset=-1, flags=0) at file.h:245 > #12 0xc081c1f8 in kern_readv (td=0xc56288c0, fd=3, auio=0xf7bd0c60) >at /usr/src/sys/kern/sys_generic.c:193 > #13 0xc081c2df in read (td=0xc56288c0, uap=0xf7bd0cfc) >at /usr/src/sys/kern/sys_generic.c:109 > ---Type to continue, or q to quit--- > #14 0xc0ae8f35 in syscall (frame=0xf7bd0d38) >at /usr/src/sys/i386/i386/trap.c:1090 > #15 0xc0acdd00 in Xint0x80_syscall () at > /usr/src/sys/i386/i386/exception.s:255 > #16 0x0033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) The strlen was supplied NULL pointer. This means that n1->dei_data is NULL. Brief looking over the RELENG_7 code does not reveal any caller of devctl_queue_data outside subr_bus.c, and all uses inside subr_bus.c seems to be safe. Added options in the config cannot affect this behaviour, I believe. You may add check at the start of the devctl_queue_data() to verify that data != NULL, and panic when it is. This way, we will see where it happen. pgpq3IBeruJwK.pgp Description: PGP signature
Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
This appears to have gone away now. I unplugged the bay that was causing the trouble, and the system booted just fine on the remaining 4 drives. Then I plugged the bay back in (live) and did an atacontrol detach/attach on that bus (I wonder why I always have to do that). The drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to make sure that everything is good, and I'll do a reboot and see if it's all ok after that. Strange, so it looks like a cable might have got a little loose or something. I wonder why that would have hung the kernel probe though. Joe on 22/05/2009 20:40 Joe Karthauser said the following: Hi Alexander, I've love it if you were able to provide some insight into this problem. I'm going to try switching sata cables around next to see if the problem goes away if I disconnect some combination of bays. Thanks, Joe on 22/05/2009 19:39 Kip Macy said the following: Motin is your best bet in tracking down ATA problems. Cheers, Kip On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: Hi Kip, I seriously don't understand what has happened. If I boot kernel.old I still get the same problem. Very confusing. :(. Joe on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser wrote: Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
I saw really strange stuff with one bad SATA cable on my 6 drive ZFS array. It would work most of the time, but the scrub would either cough up CRC's or hang. I wound up replacing the disk *AND* the cable, and it's been fine since. This is on a SuperMicro chassis with Intel chips. YMMV -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 -Original Message- From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Joe Karthauser Sent: Friday, May 22, 2009 3:45 PM To: Alexander Motin Cc: freebsd-stable@freebsd.org; Kip Macy Subject: Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)) This appears to have gone away now. I unplugged the bay that was causing the trouble, and the system booted just fine on the remaining 4 drives. Then I plugged the bay back in (live) and did an atacontrol detach/attach on that bus (I wonder why I always have to do that). The drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to make sure that everything is good, and I'll do a reboot and see if it's all ok after that. Strange, so it looks like a cable might have got a little loose or something. I wonder why that would have hung the kernel probe though. Joe on 22/05/2009 20:40 Joe Karthauser said the following: > Hi Alexander, > > I've love it if you were able to provide some insight into this problem. > > I'm going to try switching sata cables around next to see if the problem > goes away if I disconnect some combination of bays. > > Thanks, > Joe > > on 22/05/2009 19:39 Kip Macy said the following: >> Motin is your best bet in tracking down ATA problems. >> >> Cheers, >> Kip >> >> >> On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: >>> Hi Kip, >>> >>> I seriously don't understand what has happened. If I boot kernel.old >>> I still >>> get the same problem. Very confusing. :(. >>> >>> Joe >>> >>> on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser wrote: > Hmm, I've had a bit of a miserable afternoon trying to fight my > RELENG_7 > server, which now doesn't boot. :(. > > So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 > disks > (gmirror on 500Mb partition on each of five disks, and zraid2 over the > rest > of each drive). > > What I did was to update the userland, and then reboot. I didn't > upgrade > the > kernel (but I've subsequently done that and have the same problem). > > What happens is that the kernel hangs booting just after displaying a > LABEL > message or ZFS pool/spool message. I _can_ get it to boot if I boot > single > user with acpi switched off. When I do that I can manually start > zfs, and > mount all the partitions. However, one of the disks is missing > more > on > that next. > > The machine is running a gigabyte motherboard (domestic gamer P35 > board, > similar to this > > http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?Produ ctID=2533, > > although it might be a DS4 variant). I've got 5 of the 6 sata ports > wired > to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 > 5-1/4" > bays > kind of thing). > > Now, because of the gmirror I can boot the system on any disk, or > combination of plugged in disks. I should be able to succeed with the > kernel probe up to the attempt to mount the root filesystem > irrespective > of > any zfs pool, etc. And, indeed, this has been working fine for > about two > years. > > But, now it hangs in the same place no matter what disk I boot on > (I've > tried every bay). > > But, without ACPI enabled it does appear to boot ok... what's going on > here? > Is it possible that the machine has developed a hardware fault? > > Ok, finally, if I boot with ACPI disabled then one of the disks is > missing. > If I unplug it I get a disconnect message from the ata device, and a > reconnect and reinit attempt when I plug it back in, but no device > appears > on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; > atacontrol > attach sata4' and the device reappears. This happens on the other > buses, > but > not on the last one. It's not the disk, because if I swap it into > another > bay, it comes up and appears on the bus. On the other hand it
Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
I spoke too soon. It must have just randomly booted, because it is now hanging again. No amount of jiggling cables has made any difference. :(. Joe on 22/05/2009 20:40 Joe Karthauser said the following: Hi Alexander, I've love it if you were able to provide some insight into this problem. I'm going to try switching sata cables around next to see if the problem goes away if I disconnect some combination of bays. Thanks, Joe on 22/05/2009 19:39 Kip Macy said the following: Motin is your best bet in tracking down ATA problems. Cheers, Kip On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: Hi Kip, I seriously don't understand what has happened. If I boot kernel.old I still get the same problem. Very confusing. :(. Joe on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser wrote: Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: net.inet.tcp.tso=1 still neceesary with fxp was Re: TCP differences in 7.2 vs 7.1
On Fri, May 22, 2009 at 03:50:07PM -0400, Michael L. Squires wrote: > > > On Thu, 21 May 2009, Pyun YongHyeon wrote: > > >On Wed, May 20, 2009 at 05:55:29PM -0400, Michael L. Squires wrote: > >>I started having speed problems after shifting from 7.1-STABLE to > >>7.1-PRERELEASE. They have continued with 7.2-STABLLE. > >> > >>Reverting to the 7.1-STABLE kernel eliminated the problem. > >> > >>After downloading 7.2-STABLE from cvsup.freebsd.org at about 10:40 AM EST > >>on 5/20/2009, doing a buildworld/buildkernel/installkernel/installworld > >>cycle I still need to execute "net.inet.tcp.tso=1" to elminate throughput > >>problems between my home system (on Comcast) and my office PC (connected > >>via a Time-Warner connection). This also affects connections to other > >>systems; downloading Web pages (ebay.com) speeds up after I change the TSO > >>entry. > >> > >>The box in question runs NAT and has an fxp (Intel Pro100) interface > >>connected to a Comcast cable modem and an em (Intel Pro1000) interface > >>connected to the internal network. > >> > >>There are no network errors in "netstat -i" on either interface. > >> > >>The "if_fxp.c" code appears to be the May 7 version. > >> > > > >You should have cvs rev. 1.266.2.15 of if_fxp.c. > > > >>This is the dmesg entry for the card in question. The system is a dual > >>Xeon > >>Supermicro 1U box, 1GB RAM, single 300GB IDE hard drive. > >> > >>fxp0: port 0xe400-0xe43f mem > >>0xfebfd000-0xfebfdfff,0xfeb8-0xfeb9 irq 27 at device 7.0 on pci0 > >>miibus0: on fxp0 > >> > > > >Since you use both em(4) and fxp(4) I'd like to know which driver > >has the issue. Instead of disabling TSO of network stack try > >disabling TSO for each interface. For instance, > >1. Diable TSO of em(4) and check you see the same issue > > (ifconfig em0 -tso). > >2. Diable TSO of fxp(4) and check you see the same issue > > (ifconfig fxp0 -tso). > > > > The version of if_fpx.c is in fact 1.266.2.15. > > Connecting to the FreeBSD box from a PC with a bash shell under XP > SP3/Cygwin OpenSSH I find > > (1) disable "tso" on the internal "em0" interface has no effect; but > > (2) disabling "tso" on the external "fxp0" inteface eliminates the > througput problem. The effect appears to be the same as using sysctl to > disable tso on all interfaces. > > With "tso" enabled on the "fxp0" interface the connection (reading email > using "pine" in a large window) hung completely. > > There are no errors in "netstat -i" nor in /var/log/messages. > > "netstat -e" on the XP PC shows no discards or errors; however, I don't > think I've ever seen a PC under Windows admit to network errors. > > The fxp0 interface connects to a Comcast cable modem, which eventually > connects to my office PC which is in the "iga.in.gov" domain hosted by > TimeWarner. > > I'll be happy to run anything else you want. > Would you capture the failing TCP session with tcpdump and mail me the URL of the captured file(off-list)? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
Hi. Joe Karthauser wrote: I spoke too soon. It must have just randomly booted, because it is now hanging again. No amount of jiggling cables has made any difference. Can you provide verbose boot messages of your system from the beginning up to the problem? Especially, all related to the ATA. Do you have AHCI mode enabled in BIOS, or you using legacy ATA emulation? -- Alexander Motin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"