Re: Curious failure of ZFS snapshots
On Fri, 21 Nov 2008 08:16:35 -0800 Freddie Cash <[EMAIL PROTECTED]> wrote about Re: Curious failure of ZFS snapshots: FC> > GK> mclane# ll /tank/home/pt/.zfs/ FC> > GK> ls: snapshot: Bad file descriptor FC> > GK> total 0 FC> Which shell are you using? I've seen quite a few FC> different "non-existent"/"invalid directory" errors when using tcsh FC> to navigate through the .zfs/ hierarchy. Can do "cd ..", "ls .", or FC> tab completion when in anything under .zfs/ Standard root login, so it's /bin/csh. I cannot remember if I tried to cd into the dir, and after rebooting everything's fine up to now. I will try this if I see the problem again. However, it would be rather strange if this was shell-dependent, as all other snapshots were happily accessible with csh (and the panic after trying to unmount the fs is definitely not an expected behaviour either :-). FC> Using sh or zsh, these errors don't occur. FC> Just curious if this is the same kind of thing. I will try it when I see the problem next time. cu Gerrit ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: R: Re: R: Re: 6.4-RC2 crashes after a few minutes of uptime
On 2008-11-24, at 1:51 , Barbara wrote: About kgdb... I never used freebsd-update, so sorry if I'm saying something stupid, but could it be the case that the kernel has been built without debugging symbols or something like that? Does freebsd- update provide a kernel.debug? I haven't had to use a the kernel. debug file in the obj dir in a long time. As far as I know, these days, the GENERIC kernel includes debug symbols. And in cases when there aren't any debug symbols, that shouldn't prevent kgdb from loading, I wouldn't think. Hello, I had a k panic some hours ago but I think that's related to a problem with one of my HDs. I've got a dump in /var/crash, and as you were interested, I run: # kgdb /boot/kernel/kernel /var/crash/vmcore.6 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386- marcel-freebsd"...(no debugging symbols found)... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. Terminated I had to pkill kgdb as it was in a loop. Running it against kernel. debug in /usr/obj/usr/src/sys/$KERNCONF/ worked as expected. I've always followed this way, so I don't know if it was working with earlier releases. Ah, well you must not be using GENERIC then, because it does have the debugging symbols. I think this is the setting in the GENERIC config that controls it: makeoptions DEBUG=-g But I guess what you're doing works if you're using a custom kernel that does not have that config setting. - rory I'm not using GENERIC but I have makeoptions DEBUG=-g in my KERNCONF. Barbara, Ah, so you had the exact same results I got, when using /book/kernel/ kernel. So, that answers that question then, apparently I do need to build a kernel.debug to get a backtrace on 6.4. So, it looks like maybe things are different in 6 than I had remembered. I haven't looked at the 6.4-RC2 notebook to see what the kernel directory has, but on my 7.0 server at least, I've noticed that kgdb(1) does work with /book/kernel/kernel, and I think it might have to do with putting the symbols in a separate, kernel.symbols file. So, I assume that this doesn't exist on 6. However I did notice that if I remove that file, and run kgdb again (on 7.0) I also get that structure pointer error that you get, it doesn't lock up.. and I can still get a backtrace, but the output is more terse.. in that it shows function names, but without corresponding source file names and line numbers. So, the addition of the symbols file it seems, adds some some more debugging information than what the kernel provides by itself. So, maybe that makeoptions directive does different things on each version. Thank you for your feedback with this, much appreciated. Now, to see if I can build a kernel.debug on that machine, can get a backtrace -- though it sure sounds like a problem with ata(4). - rory ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: shutdown -p now crashes
Ganbold wrote: (kgdb) p *fsrootvp $3 = {v_type = VDIR, v_tag = 0xc0864e51 "ufs", v_op = 0xc0926280, v_data = 0xc3e5d000, v_mount = 0xc3e56b30, v_nmntvnodes = {tqe_next = 0xc3d119b4, tqe_prev = 0xc3e56b98}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0, vu_yield = 0}, v_hashlist = {le_next = 0x0, le_prev = 0xc3d09da0}, v_hash = 2, v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xc3d11af8}, v_dd = 0x0, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = {lk_object = {lo_name = 0xc0864e51 "ufs", lo_type = 0xc0864e51 "ufs", lo_flags = 70844416, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0956510, lk_flags = 262208, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 51, lk_lockholder = 0xc3b31d20, lk_newlock = 0x0}, v_interlock = {lock_object = { lo_name = 0xc086fb51 "vnode interlock", lo_type = 0xc086fb51 "vnode interlock", lo_flags = 16973824, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 3283295520, mtx_recurse = 0}, v_vnlock = 0xc3d11b20, v_holdcnt = 2, v_usecount = 0, v_iflag = 0, v_vflag = 1, v_writecount = 0, v_freelist = {tqe_next = 0x0, tqe_prev = 0x0}, v_bufobj = {bo_mtx = 0xc3d11b50, bo_clean = {bv_hd = {tqh_first = 0xe3d02594, tqh_last = 0xe3d025cc}, bv_root = 0xe3d02594, bv_cnt = 1}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xc3d11b9c}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_ops = 0xc091ae00, bo_bsize = 16384, bo_object = 0xc106183c, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xc3d11ac8, __bo_vnode = 0xc3d11ac8}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0} (kgdb) p rootvnode $4 = (struct vnode *) 0x0 (kgdb) p *rootvnode Cannot access memory at address 0x0 (kgdb) Konstantin, I have tried your patch. It seems like it is working, tried "shutdown -p now" 2 times and my RELENG_7 didn't crash after using zfs/geli external HDD via USB. Attached patches are for RELENG_7 (small modifications made in order to apply to RELENG_7). thanks a lot, Ganbold -- If you think education is expensive, try ignorance. -- Derek Bok, president of Harvard --- opensolaris_kobj.c~ 2008-04-17 09:23:29.0 +0800 +++ opensolaris_kobj.c 2008-11-24 14:28:01.0 +0800 @@ -67,17 +67,25 @@ kobj_open_file_vnode(const char *file) { struct thread *td = curthread; + struct filedesc *fd; struct nameidata nd; int error, flags; - if (td->td_proc->p_fd->fd_rdir == NULL) - td->td_proc->p_fd->fd_rdir = rootvnode; - if (td->td_proc->p_fd->fd_cdir == NULL) - td->td_proc->p_fd->fd_cdir = rootvnode; + fd = td->td_proc->p_fd; + FILEDESC_XLOCK(fd); + if (fd->fd_rdir == NULL) { + fd->fd_rdir = rootvnode; + vref(fd->fd_rdir); + } + if (fd->fd_cdir == NULL) { + fd->fd_cdir = rootvnode; + vref(fd->fd_cdir); + } + FILEDESC_XUNLOCK(fd); flags = FREAD; - NDINIT(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, file, td); - error = vn_open_cred(&nd, &flags, 0, td->td_ucred, NULL); + NDINIT(&nd, LOOKUP, MPSAFE, UIO_SYSSPACE, file, td); + error = vn_open_cred(&nd, &flags, O_NOFOLLOW, td->td_ucred, NULL); NDFREE(&nd, NDF_ONLY_PNBUF); if (error != 0) return (NULL); @@ -122,12 +130,15 @@ struct thread *td = curthread; struct vattr va; int error; - + int vfslocked; + + vfslocked = VFS_LOCK_GIANT(vp->v_mount); vn_lock(vp, LK_SHARED | LK_RETRY, td); error = VOP_GETATTR(vp, &va, td->td_ucred, td); VOP_UNLOCK(vp, 0, td); if (error == 0) *size = (uint64_t)va.va_size; + VFS_UNLOCK_GIANT(vfslocked); return (error); } @@ -161,6 +172,7 @@ struct uio auio; struct iovec aiov; int error; + int vfslocked; bzero(&aiov, sizeof(aiov)); bzero(&auio, sizeof(auio)); @@ -176,9 +188,11 @@ auio.uio_resid = size; auio.uio_td = td; + vfslocked = VFS_LOCK_GIANT(vp->v_mount); vn_lock(vp, LK_SHARED | LK_RETRY, td); error = VOP_READ(vp, &auio, IO_UNIT | IO_SYNC, td->td_ucred); VOP_UNLOCK(vp, 0, td); + VFS_UNLOCK_GIANT(vfslocked); return (error != 0 ? -1 : size - auio.uio_resid); } @@ -213,8 +227,11 @@ struct vnode *vp = file->ptr; struct thread *td = curthread; int flags = FREAD; - + int vfslocked; + + vfslocked = VFS_LOCK_GIANT(vp->v_mount); vn_close(vp, flags, td->td_ucred, td); + VFS_UNLOCK_GIANT(vfslocked); } kmem_free(file, sizeof(*file)); } --- vnode.h~2008-04-17 09:23:30.0 +0800 +++ vnode.h 2008-11-
Problem with Adaptec 29320LPE
Is there a problem with the Adaptec 29320LPE (PCIe x1, single-channel Ultra320) SCSI controller under FreeBSD 7? I've recently received a server with this controller, which is intended to be used to connect to Sony AIT tape libraries for backup. Unfortunately, it does not seem to function properly. It sees the connected devices without any difficulty, but fails to write to any connected drives, and produces very strange errors when attempting to address the libraries. That is, when attempting to write to a drive, the drive is seen as present, but any attempt actually to write results in an error (an end of tape is reported) without any data being written (mt status reports the tape at File Number 0, Record number 0). Additionally, attempting to address the changers produces erratic results. Sometimes, the result is normal, but at other times the results are garbled, and syslog reports a string of errors from the controller, followed by a long string of errors on 'ch' (see below). I am reasonably certain that the errors are not related to the tape libraries, as a) the libraries worked normally on the old server, and b) after installing a different controller (Adaptec 29160), the libraries function properly on the new machine. And I am reasonably sure that the problem is not a 320/160 problem, as setting the new controller to 160 in the BIOS does not help. The system is currently running FreeBSD 7.1-PRERELEASE: Wed Nov 19 11:33:15 CET 2008, from sources csup'ed immediately prior to the build. The kernel is very close to GENERIC, but with various cardbus, wlan, and usb support removed. Searching has indicated some similar-looking errors reported, but all from rather a long time ago (2000-2002). backuphost# camcontrol devlist at scbus0 target 0 lun 0 (pass0,ch3) at scbus0 target 1 lun 0 (sa3,pass1) at scbus0 target 2 lun 0 (pass2,ch4) at scbus0 target 3 lun 0 (sa4,pass3) at scbus1 target 0 lun 0 (da0,pass4) at scbus1 target 0 lun 1 (da1,pass5) backuphost# chio -f /dev/ch2 status picker 0: slot 0: slot 1: slot 2: slot 3: slot 4: slot 5: slot 6: slot 7: slot 8: slot 9: slot 10: slot 11: slot 12: slot 13: slot 14: slot 15: drive 0: backuphost# chio -f /dev/ch2 status picker 0: slot 8: slot 9: slot 10: slot 11: slot 12: slot 13: slot 14: slot 15: slot 8: slot 9: slot 10: slot 11: slot 12: slot 13: slot 14: slot 0: drive 0: backuphost# Nov 20 17:53:08 backuphost kernel: ahd0: port 0x4400-0x44ff, 0x4000-0x40ff mem 0xda60-0xda601fff irq 18 at device 4.0 on pci10 Nov 20 17:53:08 backuphost kernel: ahd0: [ITHREAD] Nov 20 17:53:08 backuphost kernel: aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 101-133Mhz, 51 2 SCBs Nov 20 15:01:16 backuphost kernel: ahd0: Transmission error detected Nov 20 15:01:16 backuphost kernel: LQISTAT1[0x0] LASTPHASE[0x40]:(P_DATAIN) SCSISIGI[0x40]:(P_DATAIN ) Nov 20 15:01:16 backuphost kernel: PERRDIAG[0xd0]:(PARITYERR|HIPERR|HIZERO) Nov 20 15:01:16 backuphost kernel: >> Dump Card State Begins < Nov 20 15:01:16 backuphost kernel: ahd0: Dumping Card State at program address 0x3b Mode 0x22 Nov 20 15:01:16 backuphost kernel: Card was paused Nov 20 15:01:16 backuphost kernel: INTSTAT[0x8]:(SCSIINT) SELOID[0x0] SELID[0x10] HS_MAILBOX[0x0] Nov 20 15:01:16 backuphost kernel: INTCTL[0xc0]:(SWTMINTEN|SWTMINTMASK) SEQINTSTAT[0x10]:(SEQ_SWTMRT O) Nov 20 15:01:16 backuphost kernel: SAVED_MODE[0x11] DFFSTAT[0x19]:(CURRFIFO_1|FIFO0FREE) Nov 20 15:01:16 backuphost kernel: SCSISIGI[0xb6]:(P_MESGOUT|REQI|BSYI|ATNI) SCSIPHASE[0x4]:(MSG_OUT _PHASE) Nov 20 15:01:16 backuphost kernel: SCSIBUS[0xc0] LASTPHASE[0x40]:(P_DATAIN) SCSISEQ0[0x0] Nov 20 15:01:16 backuphost kernel: SCSISEQ1[0x12]:(ENAUTOATNP|ENRSELI) SEQCTL0[0x0] SEQINTCTL[0x0] Nov 20 15:01:16 backuphost kernel: SEQ_FLAGS[0x20]:(DPHASE) SEQ_FLAGS2[0x0] QFREEZE_COUNT[0x40a] Nov 20 15:01:16 backuphost kernel: KERNEL_QFREEZE_COUNT[0x40a] MK_MESSAGE_SCB[0xff00] Nov 20 15:01:16 backuphost kernel: MK_MESSAGE_SCSIID[0xff] SSTAT0[0x2]:(SPIORDY) SSTAT1[0x11]:(REQIN IT|PHASEMIS) Nov 20 15:01:16 backuphost kernel: SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xac]:(ENSCSIPERR|E NBUSFREE|ENSCSIRST|ENSELTIMO) Nov 20 15:01:16 backuphost kernel: LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] Nov 20 15:01:16 backuphost kernel: LQOSTAT1[0x0] LQOSTAT2[0x0] Nov 20 15:01:16 backuphost kernel: Nov 20 15:01:16 backuphost kernel: SCB Count = 512 CMDS_PENDING = 1 LASTSCB 0x CURRSCB 0x1ff NEX TSCB 0x0 Nov 20 15:01:16 backuphost kernel: qinstart = 4230 qinfifonext = 4230 Nov 20 15:01:16 backuphost kernel: QINFIFO: Nov 20 15:01:16 backuphost kernel: WAITING_TID_QUEUES: Nov 20 15:01:16 backuphost kernel: Pending list: Nov 20 15:01:16 backuphost kernel: 511 FIFO_USE[0x0] SCB_CONTROL[0x40]:(DISCENB) SCB_SCSIID[0x7] Nov 20 15:01:16 backuphost kernel: Total 1 Nov 20 15:01:16 backuphost kernel: Ke
Re: Problem with Adaptec 29320LPE
Hi Greg, On Mon, Nov 24, 2008 at 12:42:49PM +0100, Greg Byshenk wrote: > backuphost# camcontrol devlist > at scbus0 target 0 lun 0 (pass0,ch3) >at scbus0 target 1 lun 0 (sa3,pass1) > at scbus0 target 2 lun 0 (pass2,ch4) >at scbus0 target 3 lun 0 (sa4,pass3) > at scbus1 target 0 lun 0 (da0,pass4) > at scbus1 target 0 lun 1 (da1,pass5) Are these volumes perhaps >2TB ? If so, it won't work... we stumbled on this at work a few weeks ago, and once we resized the volumes so that'd all be <2TB, the controller worked fine... As far as I know, this is the only workaround - I couldn't see relevant patches in Open/NetBSD either that might have fixed this issue :-( Regards, -- Rink P.W. Springer- http://rink.nu "Anyway boys, this is America. Just because you get more votes doesn't mean you win." - Fox Mulder ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Problem with Adaptec 29320LPE
On Mon, Nov 24, 2008 at 12:49:12PM +0100, Rink Springer wrote: > Hi Greg, > > On Mon, Nov 24, 2008 at 12:42:49PM +0100, Greg Byshenk wrote: > > backuphost# camcontrol devlist > > at scbus0 target 0 lun 0 (pass0,ch3) > >at scbus0 target 1 lun 0 (sa3,pass1) > > at scbus0 target 2 lun 0 (pass2,ch4) > >at scbus0 target 3 lun 0 (sa4,pass3) > > at scbus1 target 0 lun 0 (da0,pass4) > > at scbus1 target 0 lun 1 (da1,pass5) > Are these volumes perhaps >2TB ? If so, it won't work... we stumbled on > this at work a few weeks ago, and once we resized the volumes so that'd > all be <2TB, the controller worked fine... > > As far as I know, this is the only workaround - I couldn't see relevant > patches in Open/NetBSD either that might have fixed this issue :-( The volume da1 is indeed >2TB, but it is not connected to the controller; it (along with da0) is actually a RAID-10 array connected to a 3Ware/AMCC SATA controller. The Adaptec contoller is used only for the tape drives (the SDX-900V is AIT4; the SDX-1100 is AIT5), and they are <2TB. -- greg byshenk - [EMAIL PROTECTED] - Leiden, NL ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Integrated RTL8168/8111 NIC not assigned interface
I upgraded to 7.1-PRERELEASE and it works now. Thank you! Peter C. Lai-2 wrote: > > On 2008-11-22 08:09:31AM +1100, Peter Jeremy wrote: >> On 2008-Nov-21 00:07:26 -0800, hamtilla <[EMAIL PROTECTED]> wrote: >> >I'm running 7.0-RELEASE-i386 on Jetway's NC92-N230 mainboard. The board >> has >> >one integrated RTL8168/8111 gigabit NIC as well as an expansion board >> with >> >three RTL8168/8111 NICs. Why would the three NICs work while the onboard >> NIC >> >does not? >> > >> >[EMAIL PROTECTED]:1:0:0: class=0x02 card=0x816810ec >> >chip=0x816810ec >> >rev=0x02 hdr=0x00 >> >vendor = 'Realtek Semiconductor' >> >device = 'RTL8168/8111 PCI-E Gigabit Ethernet NIC' >> >class = network >> >subclass = ethernet >> >[EMAIL PROTECTED]:2:4:0: class=0x02 card=0x10ec16f3 chip=0x816710ec >> >rev=0x10 >> >hdr=0x00 >> >vendor = 'Realtek Semiconductor' >> >device = 'RTL8169/8110 Family Gigabit Ethernet NIC' >> >class = network >> >subclass = ethernet >> ... >> >> The on-board NIC is a different type to your expansion cards (note the >> different 'chip=' values. Looking at the code, it appears that only >> some variants of the RTL8168 are supported in 7.x. Unfortunately, >> pciconf >> doesn't report the actual hardware revision, so you can't tell from the >> pciconf output whether it's supported or not. >> >> Can you report the output of 'pciconf -r pci0:1:0:0 0x40' (which should >> report the hw revision) and 'pciconf -r pci0:2:4:0 0x40' (which gives >> me a double-check). >> >> You could try booting -current and see if the on-board NIC works there - >> the range of supported NICs has changed. >> >> -- >> Peter Jeremy >> Please excuse any delays as the result of my ISP's inability to implement >> an MTA that is either RFC2821-compliant or matches their claimed >> behaviour. > > Yes, 7.0-R is pretty old in terms of re(4) work. I believe yongari@ > is still working on this driver. 7.1 is close enough for patching > with patches from http://people.freebsd.org/~yongari/re/ > > Currently development is stifled because he has to basically guess > the appropriate magic values for various PHY permutations in these > 8111C/8168C gigabit cards everyone seems to be putting in their > motherboards these days. > > -- > === > Peter C. Lai | Bard College at Simon's Rock > Systems Administrator| 84 Alford Rd. > Information Technology Svcs. | Gt. Barrington, MA 01230 USA > peter AT simons-rock.edu | (413) 528-7428 > === > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "[EMAIL PROTECTED]" > > -- View this message in context: http://www.nabble.com/Integrated-RTL8168-8111-NIC-not-assigned-interface-tp20616760p20662192.html Sent from the freebsd-stable mailing list archive at Nabble.com. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
On Thu, Sep 11, 2008 at 11:56 AM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote: > On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote: >> On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote: >> > On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote: >> >> My box crashed again: >> >> >> >> panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated >> >> cpuid = 0 >> >> Uptime: 33d11h12m58s >> >> Dumping 3327 MB (2 chunks) >> >> chunk 0: 1MB (151 pages) ... ok >> >> chunk 1: 3327MB (851568 pages) <---hung here >> >> >> >> Still no valid dump. >> >> >> >> There is 4gig of physical memory in the machine. >> >> >> >> In /boot/loader.conf, I currently have the following: >> >> >> >> vm.kmem_size=1G >> >> vm.kmem_size_max=1G >> >> vm.kmem_size_scale=2 >> >> >> >> and in my kernel conf file I have: >> >> >> >> options KVA_PAGES=512 >> >> >> >> It stayed up for 33 days this time. Is there anything else I can do? >> > >> > First and foremost: are you using ZFS on this machine? If so, there are >> > many tunables you can apply to try and limit this; I'm willing to bet >> > it's ARC which is doing it. See below. >> > >> > In general, it appears that you need to increase the maximum range of >> > kmem. The kernel attempted to utilise more than 1GB, and your limit is >> > 1G. My machines running RELENG_7 on amd64, with only 2GB of RAM >> > installed, use the following tunables in loader.conf: >> > >> > vm.kmem_size="1536M" >> > vm.kmem_size_max="1536M" >> > >> > If ZFS is in use, I recommend these as well: >> > >> > vfs.zfs.arc_min="16M" >> > vfs.zfs.arc_max="64M" >> > vfs.zfs.prefetch_disable="1" >> > >> > Do not increase kmem_size any larger than 1.5GB; the amount of RAM you >> > have in the machine, with regards to RELENG_7, will not help. This is a >> > known limitation which has been fixed in HEAD/CURRENT (where the limit >> > has been increased to 512GB). See the "Kernel" section below; you'll >> > see the applicable item. >> > >> > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues >> > >> > Your only solution may be to run HEAD/CURRENT. >> >> I am not running ZFS. My file systems are ufs. >> >> This feels like some sort of memory leak in the kernel. Giving it >> more and more memory just seems to delay the crash. Are you saying >> the crash is fixed in HEAD/CURRENT? > > It's an intentional crash, not "the program tried to access NULL, which > crashed the machine" crash. The kernel wants more memory to accomplish > a certain thing, and it's not available. kris@ can explain this in > better terms than I can. > > First and foremost, it would be good to find out what all you are > running on this machine (process-wise). A process could be tickling > something in the kernel which requires a large amount of memory to be > required. I can imagine something like MySQL would require this. > > Ideally what needs to happen is to debug the kernel or get a full map > of kmem to find out what's using what. I believe vmstat -m or vmstat -z > output might help. > > Obviously since the machine panics, you won't be able to run those > commands after the fact. I would recommend you set up a cronjob that > runs every 1-2 minutes and logs the output of both of those commands > to a file. When the panic happens, restart the system and look at > the logfile to see if you can figure out if anything suddenly starts > taking up a large amount of memory, or if it's a gradual thing > (indicating a memory leak). > > If you can figure out what might be tickling the problem, you can > ultimately figure out if increasing kmem is the right thing to do, or if > there's a greater problem here. > >> I'm running 6.3 by the way. >> >> I have put your changes into my loader.conf, we'll see how long it >> goes this time. I'm not qute in position to update everything to 7.x >> at the moment. > > Our production webservers run RELENG_6 and RELENG_7, and we don't > encounter this kind of problem. I'm not saying what you're experiencing > is indicative of hardware issues or something like that -- I'm simply > saying I have loaded systems which don't ever hit that condition. So > figuring out what's causing it in your case would be good. > This appears to be too high as the machine reboots immediately after the fsck: >> > vm.kmem_size="1536M" >> > vm.kmem_size_max="1536M" Returning it to 1G, it panics again about a month later. Here's vmstat -m and -z roughly 1 minute before it crashed (I was logging to a file every minute via cron): Fri Nov 21 15:15:00 EST 2008 Type InUse MemUse HighUse Requests Size(s) pfs_vncache 2 1K - 864205 32 GEOM 16824K - 416279 16,32,64,128,256,512,1024,2048,4096 isadev17 2K - 17 64 CAM periph 1 1K -1 128 cdev26 4K - 26 128 CAM queue 3 1K -3 16 file desc 739 4
ext2 inode size patch - RE: PR kern/124621
A while back, I submitted a patch for PR kern/124621, which allows the mounting of an ext2(3) filesystem created with an inode size other than 128. The e2fsprogs' default is now 256, so file systems created on newer Linux distributions or with the port will not be mountable. I was hopeful this would get committed in time for 7.1-RELEASE (and 6.4-RELEASE), however the PR remains open. If there is an issue with the patch itself, I would be glad to fix it. I'm posting to fs@ because hopefully some folks more experienced with file system/kernel code can have a look and see if the patch is ok to commit. I've seen a few people in ##freebsdhelp on Freenode as well as #freebsdhelp on EFnet with this problem, and have had them test this patch out with success (and no obvious adverse effects), so I was hoping it could committed in time for 7.1-RELEASE. Since 6.4 is so close to release, I'm not so sure about that. Anyway, I would appreciate it if the patch could get some review to see if it can be committed in time. Regards, Josh ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
no priority on the console?
As per my previous message, I've spent about 3 months trying to debug a problem that was causing all disk I/O to go very slowly. One of the things which made this nearly impossible to diagnose was the absolute lack of priority given to the console. Logging in on the console would take 12-15 minutes. Hitting enter on the console would usually take between 3 and 5 minutes. This doesn't seem right to me. Can someone explain why the console isn't given a very high priority? Why not? What other mechanism does the sysadmin have for debugging, at a time when SSH logins either fail, or take up to an hour to complete? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
smartd long self-test causes drives to hang
I've spent about 3 months tracing down what was causing my personal colo box to start getting "sluggish" right around dawn every Saturday morning. It took so long because some mornings I simply couldn't pull my head out of my tail enough to do proper debugging. The cause was *really slow* filesystem response time. No cron jobs in that period. No specific process ran any slower than another, although I eventually learned that ones which did no file i/o were fine. And finally I realized that just "ls -la" was very slow (~1 minute) even after I had killed off every disk-using process in the system. SMTP and HTTP in particular were basically fubar. No data loss, just *real slow*. Nothing other than a soft reboot ever solved the problem.Even leaving it running only minimal processes for 24 hours didn't bring it back to normal. Finally I was browsing through Jeremy Chadwick's list of known ATA problems and spotted his comments about smartd self-tests causing problems. Sure enough, my long self test was scheduled for 5am on Saturday mornings. Rechecking the observed slow-down periods confirmed that the problem never became visible before 5am. (sometimes it took up to 45 minutes before things slowed down enough to set off monitoring alarms) So, long story short, if you're having weirdness in system time response - check the smartd configuration, and try disabling the self tests. The short self test I was running daily didn't appear to affect anything, but the long test was just bringing the system to just shuddering and limping at best. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Can I get a committer to mark this bug as blocking 6.4-RELEASE ?
This is now filed as PR 129149 http://www.freebsd.org/cgi/query-pr.cgi?pr=129149 Given the nature of this bug, can I persuade someone to mark this as blocking 6.4-RELEASE ? On Nov 5, 2008, at 3:41 PM, Jo Rhett wrote: On Oct 27, 2008, at 8:51 AM, John Baldwin wrote: On Friday 24 October 2008 02:48:13 pm Jo Rhett wrote: So I booted up by CD and used Fixit mode to switch the system to boot via serial (keyboard detached), but this gathered me even less. /boot.config: -Dh Consoles: internal video/keyboard serial port BIOS drive A: is disk0 BIOS drive C: is disk1 BIOS drive D: is disk2 BIOS 639kB/4062144kB available memory FreeBSD/i386 bootstrap loader, Revision 1.1 ([EMAIL PROTECTED] Plugging back in the monitor after lockup showed only a single char more: ([EMAIL PROTECTED] This confirms it is hanging in one of the two BIOS routines to output a character. One thing you can do would be to boot up and do the following: dd if=/dev/mem bs=0x400 count=1 of=idt.out dd if=/dev/mem bs=64k iseek=15 count=1 of=bios.out Then place those files some place I can fetch them. Both files are at http://support.netconsonance.com/freebsd/ FYI, this is notable -- the keyboard does not respond at the boot prompt. I mean the menu where you can escape to the loader prompt, with the fat freebsd ascii art. No keyboard presses are observed here. This is also true for the boot menu on the 6.4 installation CD too. No problems with 6.2 or 6.3 -- Jo Rhett Net Consonance : consonant endings by net philanthropy, open source and other randomness ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: smartd long self-test causes drives to hang
On re-reading the message I realized that my message was in danger of being content-free. gmirror whole-disk mirror of seagate 300gb drives $ atacontrol list ATA channel 0: Master: ad0 ATA/ATAPI revision 7 Slave: ad1 ATA/ATAPI revision 7 $ gmirror list Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 575427344 Providers: 1. Name: mirror/gm0 Mediasize: 300069051904 (279G) Sectorsize: 512 Mode: r5w5e6 Consumers: 1. Name: ad0 Mediasize: 300069052416 (279G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3917165570 2. Name: ad1 Mediasize: 300069052416 (279G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3874187635 On Nov 24, 2008, at 12:48 PM, Jo Rhett wrote: I've spent about 3 months tracing down what was causing my personal colo box to start getting "sluggish" right around dawn every Saturday morning. It took so long because some mornings I simply couldn't pull my head out of my tail enough to do proper debugging. The cause was *really slow* filesystem response time. No cron jobs in that period. No specific process ran any slower than another, although I eventually learned that ones which did no file i/o were fine. And finally I realized that just "ls -la" was very slow (~1 minute) even after I had killed off every disk-using process in the system. SMTP and HTTP in particular were basically fubar. No data loss, just *real slow*. Nothing other than a soft reboot ever solved the problem.Even leaving it running only minimal processes for 24 hours didn't bring it back to normal. Finally I was browsing through Jeremy Chadwick's list of known ATA problems and spotted his comments about smartd self-tests causing problems. Sure enough, my long self test was scheduled for 5am on Saturday mornings. Rechecking the observed slow-down periods confirmed that the problem never became visible before 5am. (sometimes it took up to 45 minutes before things slowed down enough to set off monitoring alarms) So, long story short, if you're having weirdness in system time response - check the smartd configuration, and try disabling the self tests. The short self test I was running daily didn't appear to affect anything, but the long test was just bringing the system to just shuddering and limping at best. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jo Rhett wrote: > This is now filed as PR 129149 > > http://www.freebsd.org/cgi/query-pr.cgi?pr=129149 > > Given the nature of this bug, can I persuade someone to mark this as > blocking 6.4-RELEASE ? My wild guess is that this is somehow related to SMP handling since the installation process would install a SMP kernel, but the default CD-ROM kernel is UP for 6.x. Could you please try if you have the same problem with UP kernel? (Copy from LiveCD or something) > On Nov 5, 2008, at 3:41 PM, Jo Rhett wrote: >> On Oct 27, 2008, at 8:51 AM, John Baldwin wrote: >>> On Friday 24 October 2008 02:48:13 pm Jo Rhett wrote: So I booted up by CD and used Fixit mode to switch the system to boot via serial (keyboard detached), but this gathered me even less. /boot.config: -Dh Consoles: internal video/keyboard serial port BIOS drive A: is disk0 BIOS drive C: is disk1 BIOS drive D: is disk2 BIOS 639kB/4062144kB available memory FreeBSD/i386 bootstrap loader, Revision 1.1 ([EMAIL PROTECTED] Plugging back in the monitor after lockup showed only a single char more: ([EMAIL PROTECTED] >>> >>> This confirms it is hanging in one of the two BIOS routines to output a >>> character. One thing you can do would be to boot up and do the >>> following: >>> >>> dd if=/dev/mem bs=0x400 count=1 of=idt.out >>> dd if=/dev/mem bs=64k iseek=15 count=1 of=bios.out >>> >>> Then place those files some place I can fetch them. >> >> Both files are at http://support.netconsonance.com/freebsd/ >> >> FYI, this is notable -- the keyboard does not respond at the boot >> prompt. I mean the menu where you can escape to the loader prompt, >> with the fat freebsd ascii art. No keyboard presses are observed >> here. This is also true for the boot menu on the 6.4 installation CD >> too. >> >> No problems with 6.2 or 6.3 >> >> -- >> Jo Rhett >> Net Consonance : consonant endings by net philanthropy, open source >> and other randomness >> >> >> ___ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "[EMAIL PROTECTED]" > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "[EMAIL PROTECTED]" - -- Xin LI <[EMAIL PROTECTED]> http://www.delphij.net/ FreeBSD - The Power to Serve! -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkkrIc8ACgkQi+vbBBjt66BVUACcDLDK7Ubugt2sto8WKAYfxF0L 93cAoI3bJ/7YcKQeVUmWTO9R2tOCOf6W =dEk9 -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?
So boot from CD, go to LIVE filesystem, mount my root and copy only / boot/kernel? Are there any other modules I should copy, or settings I should change? On Nov 24, 2008, at 1:51 PM, Xin LI wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jo Rhett wrote: This is now filed as PR 129149 http://www.freebsd.org/cgi/query-pr.cgi?pr=129149 Given the nature of this bug, can I persuade someone to mark this as blocking 6.4-RELEASE ? My wild guess is that this is somehow related to SMP handling since the installation process would install a SMP kernel, but the default CD- ROM kernel is UP for 6.x. Could you please try if you have the same problem with UP kernel? (Copy from LiveCD or something) On Nov 5, 2008, at 3:41 PM, Jo Rhett wrote: On Oct 27, 2008, at 8:51 AM, John Baldwin wrote: On Friday 24 October 2008 02:48:13 pm Jo Rhett wrote: So I booted up by CD and used Fixit mode to switch the system to boot via serial (keyboard detached), but this gathered me even less. /boot.config: -Dh Consoles: internal video/keyboard serial port BIOS drive A: is disk0 BIOS drive C: is disk1 BIOS drive D: is disk2 BIOS 639kB/4062144kB available memory FreeBSD/i386 bootstrap loader, Revision 1.1 ([EMAIL PROTECTED] Plugging back in the monitor after lockup showed only a single char more: ([EMAIL PROTECTED] This confirms it is hanging in one of the two BIOS routines to output a character. One thing you can do would be to boot up and do the following: dd if=/dev/mem bs=0x400 count=1 of=idt.out dd if=/dev/mem bs=64k iseek=15 count=1 of=bios.out Then place those files some place I can fetch them. Both files are at http://support.netconsonance.com/freebsd/ FYI, this is notable -- the keyboard does not respond at the boot prompt. I mean the menu where you can escape to the loader prompt, with the fat freebsd ascii art. No keyboard presses are observed here. This is also true for the boot menu on the 6.4 installation CD too. No problems with 6.2 or 6.3 -- Jo Rhett Net Consonance : consonant endings by net philanthropy, open source and other randomness ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " - -- Xin LI <[EMAIL PROTECTED]>http://www.delphij.net/ FreeBSD - The Power to Serve! -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkkrIc8ACgkQi+vbBBjt66BVUACcDLDK7Ubugt2sto8WKAYfxF0L 93cAoI3bJ/7YcKQeVUmWTO9R2tOCOf6W =dEk9 -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jo Rhett wrote: > So boot from CD, go to LIVE filesystem, mount my root and copy only > /boot/kernel? Yes. > Are there any other modules I should copy, or settings I should change? You should probably overwrite the whole /boot/kernel directory, i.e. rename /boot/kernel to /boot/kernel.old. BTW could you also test if 7.1-PRERELEASE exhibit the same issue? > On Nov 24, 2008, at 1:51 PM, Xin LI wrote: > Jo Rhett wrote: This is now filed as PR 129149 http://www.freebsd.org/cgi/query-pr.cgi?pr=129149 Given the nature of this bug, can I persuade someone to mark this as blocking 6.4-RELEASE ? > > My wild guess is that this is somehow related to SMP handling since the > installation process would install a SMP kernel, but the default CD-ROM > kernel is UP for 6.x. Could you please try if you have the same problem > with UP kernel? (Copy from LiveCD or something) > On Nov 5, 2008, at 3:41 PM, Jo Rhett wrote: > On Oct 27, 2008, at 8:51 AM, John Baldwin wrote: >> On Friday 24 October 2008 02:48:13 pm Jo Rhett wrote: >>> So I booted up by CD and used Fixit mode to switch the system to boot >>> via serial (keyboard detached), but this gathered me even less. >>> >>> /boot.config: -Dh >>> Consoles: internal video/keyboard serial port >>> BIOS drive A: is disk0 >>> BIOS drive C: is disk1 >>> BIOS drive D: is disk2 >>> BIOS 639kB/4062144kB available memory >>> >>> FreeBSD/i386 bootstrap loader, Revision 1.1 >>> ([EMAIL PROTECTED] >>> >>> Plugging back in the monitor after lockup showed only a single char >>> more: >>> ([EMAIL PROTECTED] >> >> This confirms it is hanging in one of the two BIOS routines to >> output a >> character. One thing you can do would be to boot up and do the >> following: >> >> dd if=/dev/mem bs=0x400 count=1 of=idt.out >> dd if=/dev/mem bs=64k iseek=15 count=1 of=bios.out >> >> Then place those files some place I can fetch them. > > Both files are at http://support.netconsonance.com/freebsd/ > > FYI, this is notable -- the keyboard does not respond at the boot > prompt. I mean the menu where you can escape to the loader prompt, > with the fat freebsd ascii art. No keyboard presses are observed > here. This is also true for the boot menu on the 6.4 installation CD > too. > > No problems with 6.2 or 6.3 > > -- > Jo Rhett > Net Consonance : consonant endings by net philanthropy, open source > and other randomness > > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" > > - -- Xin LI <[EMAIL PROTECTED]> http://www.delphij.net/ FreeBSD - The Power to Serve! -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkkrKMoACgkQi+vbBBjt66AARgCbBHYl8WpX4jjoJrRbrKjJUMPg lvsAnRlA6be6C62yQNrmNdLhWbOsCBAF =DiYt -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RELENG_7 panic under load: vm_page_unwire: invalid wire count: 0
Box with fresh RELENG_7 panic under heavy network load (more than 50k connections). This panics seems to be senfile(2) related, because when sendfile disabled in nginx, I can't reproduce the problem. Backtrace in all cases like this: # kgdb kernel /spool/crash/vmcore.1 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: vm_page_unwire: invalid wire count: 0 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x182 vm_page_unwire() at vm_page_unwire+0x84 sf_buf_mext() at sf_buf_mext+0x3c mb_free_ext() at mb_free_ext+0x99 sbdrop_internal() at sbdrop_internal+0x1e8 tcp_do_segment() at tcp_do_segment+0x1512 tcp_input() at tcp_input+0x7f7 ip_input() at ip_input+0xa8 ether_demux() at ether_demux+0x1b4 ether_input() at ether_input+0x1bb bge_intr() at bge_intr+0x3ca ithread_loop() at ithread_loop+0x180 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xea28fd30, rbp = 0 --- Uptime: 36m47s Physical memory: 4087 MB Dumping 708 MB: 693 677 661 645 629 613 597 581 565 549 533 517 501 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5 #0 doadump () at pcpu.h:195 195 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:195 #1 0x8031adf8 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0x8031b25c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0x8044a084 in vm_page_unwire (m=Variable "m" is not available. ) at /usr/src/sys/vm/vm_page.c:1410 #4 0x80379a4c in sf_buf_mext (addr=Variable "addr" is not available. ) at /usr/src/sys/kern/uipc_syscalls.c:1720 #5 0x8036e9c9 in mb_free_ext (m=0xff0081f93d00) at /usr/src/sys/kern/uipc_mbuf.c:257 #6 0x80372c38 in sbdrop_internal (sb=0xff00b4161458, len=2896) at mbuf.h:515 #7 0x803d6532 in tcp_do_segment (m=0xff0075c23b00, th=0xff0075c53024, so=0xff00b41612d0, tp=0xff00b4154b60, drop_hdrlen=52, tlen=0) at /usr/src/sys/netinet/tcp_input.c:2042 #8 0x803d7bc7 in tcp_input (m=0xff0075c23b00, off0=20) at /usr/src/sys/netinet/tcp_input.c:846 #9 0x803cf108 in ip_input (m=0xff0075c23b00) at /usr/src/sys/netinet/ip_input.c:665 #10 0x803b8004 in ether_demux (ifp=0xff0001255800, m=0xff0075c23b00) at /usr/src/sys/net/if_ethersubr.c:834 #11 0x803b825b in ether_input (ifp=0xff0001255800, m=0xff0075c23b00) at /usr/src/sys/net/if_ethersubr.c:692 #12 0x801bcf5a in bge_intr (xsc=Variable "xsc" is not available. ) at /usr/src/sys/dev/bge/if_bge.c:3160 #13 0x802fb5f0 in ithread_loop (arg=0xff0003711840) at /usr/src/sys/kern/kern_intr.c:1088 #14 0x802f7f7f in fork_exit (callout=0x802fb470 , arg=0xff0003711840, frame=0xea28fc80) at /usr/src/sys/kern/kern_fork.c:804 #15 0x8045b88e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:455 #16 0x in ?? () #17 0x in ?? () #18 0x0001 in ?? () in /boot/loader.conf I have: vm.kmem_size=1536M # 2 Mb KVA/kmem net.inet.tcp.tcbhashsize=131072 # 64M KVA kern.maxbcache=64M # 4M KVA kern.ipc.maxpipekva=4M # net.inet.tcp.syncache.hashsize=1024 net.inet.tcp.syncache.bucketlimit=100 in /etc/sysctl.conf # 576 Mb KVA/kmem kern.ipc.nmbclusters=262144 kern.ipc.nmbjumbop=65536 kern.ipc.maxsockets=307200 kern.ipc.somaxconn=4096 kern.maxfiles=307200 kern.maxfilesperproc=102400 $ sysctl vm.kvm_free vm.kvm_free: 327151616 netstat -m output, several seconds before panic: 380270/63895/444165 mbufs in use (current/cache/total) 14141/29273/43414/262144 mbuf clusters in use (current/cache/total/max) 14141/29251 mbuf+clusters out of packet secondary zone in use (current/cache) 0/9/9/65536 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 123349K/74555K/197905K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 1 requests for I/O initiated by sendfile 0 calls to protocol drain routines -- Anton Yuzhaninov ___ freebsd-stable@freebsd.org mailing list ht
Re: MFC ZFS: when?
On Fri, 21 Nov 2008, Zaphod Beeblebrox wrote: In several of the recent ZFS posts, multiple people have asked when this will be MFC'd to 7.x. This query has been studiously ignored as other chatter about whatever ZFS issue is discussed. Presumably the MFC schedule is largely up to Pawel, who did the work. However, Pawel was on travel last weekend and week attending MeetBSD and the FreeBSD developer summit in the bay area, and hasn't been seen on stable@ since the 17th. I think it's likely not so much that anyone is being studiously ignored, it's that the person who can best answer he question hasn't been keeping up with the list for a bit. Robert N M Watson Computer Laboratory University of Cambridge So in a post with no other bug report or discussion content to distract us, when is it intended that ZFS be MFC'd to 7.x? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: MFC ZFS: when?
The problem appears to be that the latest ZFS commit in 8-CURRENT relies on too many other new features that aren't in 7.1. After 7.1 is released, then perhaps ZFS and the other new code it requires can be moved into 7-STABLE? - Andrew ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: RELENG_7 panic under load: vm_page_unwire: invalid wire count: 0
On 25.11.2008 01:48, Anton Yuzhaninov wrote: Box with fresh RELENG_7 panic under heavy network load (more than 50k connections). This panics seems to be senfile(2) related, because when sendfile disabled in nginx, I can't reproduce the problem. Backtrace in all cases like this: # kgdb kernel /spool/crash/vmcore.1 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: vm_page_unwire: invalid wire count: 0 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x182 vm_page_unwire() at vm_page_unwire+0x84 sf_buf_mext() at sf_buf_mext+0x3c mb_free_ext() at mb_free_ext+0x99 sbdrop_internal() at sbdrop_internal+0x1e8 tcp_do_segment() at tcp_do_segment+0x1512 tcp_input() at tcp_input+0x7f7 ip_input() at ip_input+0xa8 ether_demux() at ether_demux+0x1b4 ether_input() at ether_input+0x1bb bge_intr() at bge_intr+0x3ca ithread_loop() at ithread_loop+0x180 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xea28fd30, rbp = 0 --- Uptime: 36m47s Physical memory: 4087 MB Dumping 708 MB: 693 677 661 645 629 613 597 581 565 549 533 517 501 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5 #0 doadump () at pcpu.h:195 195 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:195 #1 0x8031adf8 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0x8031b25c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0x8044a084 in vm_page_unwire (m=Variable "m" is not available. ) at /usr/src/sys/vm/vm_page.c:1410 #4 0x80379a4c in sf_buf_mext (addr=Variable "addr" is not available. ) at /usr/src/sys/kern/uipc_syscalls.c:1720 #5 0x8036e9c9 in mb_free_ext (m=0xff0081f93d00) at /usr/src/sys/kern/uipc_mbuf.c:257 May be it is wire_count integer overflow? wire_count type is u_short... -- Anton Yuzhaninov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
FreeBSD 7.0-STABLE Jul 23: panic: ffs_blkfree: freeing free frag
I have a box I am using for hosting jailed web servers. I did a test move of a jail from a FreeBSD 6 box to the FreeBSD 7 server, web1.hosting. It took forever, 30 minutes to be exact, to create the jail with the 3GB image file and restore the data from the FreeBSD 6 box into it. I created a test archive of the running jail with ezjail-admin on the FreeBSD 6 box and scp'd it to web1.hosting. That took 5 minutes. (I was timing all of this to estimate how long it would really take later.) Once the archive was on web1.hosting, I created the new jail using the archive to populate it. sudo ezjail-admin create -a test_host_tcworks_net-200811241856.40.tar.gz \ -s 3G -i testhost.tcworks.net 192.168.1.238 That step took 40 minutes. According to 'systat -vm 1', da0 tended to show around 90% utilization, da1 was about 23% and MB/s was about 1.6 for both during the creation of the jail. After about 20 to 40 minutes of ensuring that the jail was working properly with the compat6x libs, I decided to erase the test jail and get ready for doing the transfer for real during the next maintenance window. Just before the box stopped responding to me, I had run: sudo ezjail-admin delete -w testhost.tcworks.net It might have been about 30 seconds after that I noticed it wasn't responding. According to Nagios, it took about 25 minutes to panic, reboot, fsck and come back up. Funny, it felt a lot longer. The gmirror is currently degraded and 'systat -vm 1' is showing 98% utilization on da0 and 23% utilization on da1 with 35 to 50MB/s on both da0 and da1. I hadn't looked at the mirror status before the crash. 21:42:09 Mon Nov 24 $ gmirror status NameStatus Components mirror/gm0 DEGRADED da0 (84%) da1 I think I'll wait for it to complete the rebuild before I put any disk load on it looking for when it degraded, if not during the crash. 21:53:34 Mon Nov 24 # gmirror status NameStatus Components mirror/gm0 COMPLETE da0 da1 The disks show to be quiet in systat, as expected. I don't find any messages except for when it booted up. I think the mirror was whole before the crash. The console log files go back to July 21 2008. The messages log files only go back to Nov 22. I need to fix that. The syslog messages about gm0, the kgdb output, and /var/run/dmesg.boot are below. If you want anything else, please let me know. 22:15:02 Mon Nov 24 $ gmirror list Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 149269652 Providers: 1. Name: mirror/gm0 Mediasize: 146815737344 (137G) Sectorsize: 512 Mode: r6w6e7 Consumers: 1. Name: da0 Mediasize: 146815737856 (137G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 779766152 2. Name: da1 Mediasize: 146815737856 (137G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 1224070577 21:58:06 Mon Nov 24 $ sudo cat /var/log/console.log | grep gm0 Nov 24 21:01:23 web1 kernel: kernel dumps on /dev/mirror/gm0s1b Nov 24 21:01:23 web1 kernel: swapon: adding /dev/mirror/gm0s1b as swap device Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1a: 3505 files, 133813 used, 120002 free (2498 frags, 14688 blocks, 1.0% fragmentation) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=967747 OWNER=root MODE=100644 Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=1073741824 MTIME=Oct 15 18:51 2008 (CLEARED) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=967754 OWNER=root MODE=100644 Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=1073741824 MTIME=Oct 15 19:02 2008 (CLEARED) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=1978373 OWNER=root MODE=100644 Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=3221225472 MTIME=Nov 24 20:42 2008 (CLEARED) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: ZERO LENGTH DIR I=1978491 OWNER=root MODE=40755 Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=0 MTIME=Oct 28 18:38 2008 (CLEARED) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=1978492 OWNER=root MODE=100644 Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=1073741824 MTIME=Oct 15 18:58 2008 (CLEARED) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=2596868 OWNER=root MODE=100644 Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=0 MTIME=Nov 22 02:54 2008 (CLEARED) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=3109973 OWNER=mysql MODE=100600 Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=0 MTIME=Nov 22 02:54 2008 (CLEARED) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=3109974 OWNER=mysql MODE=100600 Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=0 MTIME=Nov 22 02:54 2008 (CLEARED) Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=3109975 OWNE
ioctl DIOCSMBR: Inappropriate ioctl for device
Hi, I am working on a nanobsd derived system for updating an embedded pfSense image. The disk is partitioned into 4 partitions with 2 similar "code" partitions. One of the two code partition is live at any moment. To update the partition image is written to the other partition and a command like boot0cfg -s 2 -v ad2 to boot to the new partition. Instead of using device names I am using bsdlabel and refer the disks using the label in fdisk. Current partitions are as follows: nanoimg:~# fdisk ad2 *** Working on device /dev/ad2 *** parameters extracted from in-core disklabel are: cylinders=1999 heads=16 sectors/track=63 (1008 blks/cyl) Figures below won't work with BIOS for partitions not in cyl 1 parameters to be used for BIOS calculations are: cylinders=1999 heads=16 sectors/track=63 (1008 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 32, size 239584 (116 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 467/ head 15/ sector 32 The data for partition 2 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 239648, size 239584 (116 Meg), flag 0 beg: cyl 468/ head 1/ sector 1; end: cyl 935/ head 15/ sector 32 The data for partition 3 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 479232, size 2048 (1 Meg), flag 0 beg: cyl 936/ head 0/ sector 1; end: cyl 939/ head 15/ sector 32 The data for partition 4 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 481280, size 20480 (10 Meg), flag 0 beg: cyl 940/ head 0/ sector 1; end: cyl 979/ head 15/ sector 32 dmesg shows the following when booting: ad2: 983MB at ata1-master PIO4 GEOM: ad2: partition 4 does not start on a track boundary. GEOM: ad2: partition 4 does not end on a track boundary. GEOM: ad2: partition 3 does not start on a track boundary. GEOM: ad2: partition 3 does not end on a track boundary. GEOM: ad2: partition 2 does not start on a track boundary. GEOM: ad2: partition 2 does not end on a track boundary. GEOM: ad2: partition 1 does not start on a track boundary. GEOM: ad2: partition 1 does not end on a track boundary. GEOM_LABEL: Label for provider ad2s3 is ufs/cfg. GEOM_LABEL: Label for provider ad2s4 is ufs/cf. GEOM_LABEL: Label for provider ad2s1a is ufs/root0. GEOM_LABEL: Label for provider ad2s2a is ufs/root1. Trying to mount root from ufs:/dev/ufs/root0 Fstab is: /dev/ufs/root0 / ufs ro 1 1 /dev/ufs/cfg /cfg ufs rw,noauto 2 2 /dev/ufs/cf /cf ufs ro 1 1 both ad2s1a and ad2s2a are active and they appear in boot screen as F1 and F2. I can manually press F1 and F2 and boot from either of them. But when I give a command boot0cfg -s 1 -v ad2 I get boot0cfg: /dev/ad2: Class not found boot0cfg: /dev/ad2: ioctl DIOCSMBR: Inappropriate ioctl for device I have searched google and archives and could not find much about this error. Any help to resolve this would be much appreciated. with regards, raj ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
nfs unreachable: can't get /dev/console for controlling terminal
Hi, besides the wrong order of initializing syslogd in rc system when IPv6 has been enabled, I have found a second similar problem with the rc system on my client desktops. When you physically detach your NIC or make wireless access point inaccessible on which you have an nfs mounted file system (in fstab). The system will prevent you to get access to /dev/console and won't even start in single user mode. This is extremely annoying. Nov 25 07:58:11 zelda init: /bin/sh on /etc/rc terminated abnormally, going to single user mode Nov 25 07:58:11 zelda init: can't get /dev/console for controlling terminal: Operation not permitted Nov 25 07:58:42 zelda init: can't get /dev/console for controlling terminal: Operation not permitted Nov 25 08:00:13 zelda last message repeated 3 times Nov 25 08:01:15 zelda last message repeated 2 times -- Martin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"