Re: Curious failure of ZFS snapshots

2008-11-24 Thread Gerrit Kühn
On Fri, 21 Nov 2008 08:16:35 -0800 Freddie Cash <[EMAIL PROTECTED]> wrote
about Re: Curious failure of ZFS snapshots:

FC> > GK> mclane# ll /tank/home/pt/.zfs/
FC> > GK> ls: snapshot: Bad file descriptor
FC> > GK> total 0

FC> Which shell are you using?  I've seen quite a few 
FC> different "non-existent"/"invalid directory" errors when using tcsh
FC> to navigate through the .zfs/ hierarchy.  Can do "cd ..", "ls .", or
FC> tab completion when in anything under .zfs/

Standard root login, so it's /bin/csh.
I cannot remember if I tried to cd into the dir, and after rebooting
everything's fine up to now. I will try this if I see the problem again.
However, it would be rather strange if this was shell-dependent, as all
other snapshots were happily accessible with csh (and the panic after
trying to unmount the fs is definitely not an expected behaviour
either :-). 

FC> Using sh or zsh, these errors don't occur.
FC> Just curious if this is the same kind of thing.

I will try it when I see the problem next time.


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: R: Re: R: Re: 6.4-RC2 crashes after a few minutes of uptime

2008-11-24 Thread Rory Arms


On 2008-11-24, at 1:51 , Barbara wrote:


About kgdb...
I never used freebsd-update, so sorry if I'm saying



something

stupid, but could it be the case that the kernel has been



built

without debugging symbols or something like that? Does freebsd-




update provide a kernel.debug?


I haven't had to use a the kernel.

debug file

in the obj dir in a long

time. As far as I know, these days,

the GENERIC

kernel includes debug

symbols. And in cases when there

aren't any debug

symbols, that

shouldn't prevent kgdb from loading, I

wouldn't think.


Hello,

I had a k panic some hours ago but I think

that's related to a

problem with one
of my HDs.

I've got a dump

in /var/crash, and as you were interested, I run:



  # kgdb

/boot/kernel/kernel /var/crash/vmcore.6



  GNU gdb 6.1.1
[FreeBSD]




  Copyright 2004 Free Software Foundation, Inc.

  GDB is free


software, covered by the GNU General Public License, and you are




welcome

to change it and/or distribute copies of it under certain

conditions.


  Type
"show copying" to see the conditions.



There is absolutely no warranty for

GDB.  Type "show warranty" for details.




  This GDB was configured as "i386-
marcel-freebsd"...(no debugging

symbols found)...


  Attempt to extract a
component of a value that is

not a structure pointer.


  Attempt to extract a
component of a value

that is not a structure pointer.


  Attempt to extract a
component of

a value that is not a structure pointer.


  Attempt to extract a


component of a value that is not a structure pointer.


  Terminated





I had
to pkill kgdb as it was in a loop.

Running it against kernel.

debug in

/usr/obj/usr/src/sys/$KERNCONF/ worked as expected.
I've always

followed this

way, so I don't know if it was working with earlier releases.




Ah, well you must not be using GENERIC then, because it does have the



debugging symbols.

I think this is the setting in the GENERIC config that

controls it:


makeoptions DEBUG=-g

But I guess what you're doing works if

you're using a custom kernel

that does not have that config setting.

-

rory




I'm not using GENERIC but I have
makeoptions DEBUG=-g
in my KERNCONF.


Barbara,

Ah, so you had the exact same results I got, when using /book/kernel/ 
kernel. So, that answers that question then, apparently I do need to  
build a kernel.debug to get a backtrace on 6.4.


So, it looks like maybe things are different in 6 than I had  
remembered. I haven't looked at the 6.4-RC2 notebook to see what the  
kernel directory has, but on my 7.0 server at least, I've noticed that  
kgdb(1) does work with /book/kernel/kernel, and I think it might have  
to do with putting the symbols in a separate, kernel.symbols file. So,  
I assume that this doesn't exist on 6. However I did notice that if I  
remove that file, and run kgdb again (on 7.0) I also get that  
structure pointer error that you get, it doesn't lock up.. and I can  
still get a backtrace, but the output is more terse.. in that it shows  
function names, but without corresponding source file names and line  
numbers. So, the addition of the symbols file it seems, adds some some  
more debugging information than what the kernel provides by itself.


So, maybe that makeoptions directive does different things on each  
version.


Thank you for your feedback with this, much appreciated. Now, to see  
if I can build a kernel.debug on that machine, can get a backtrace --  
though it sure sounds like a problem with ata(4).


- rory
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: shutdown -p now crashes

2008-11-24 Thread Ganbold

Ganbold wrote:

(kgdb) p *fsrootvp
$3 = {v_type = VDIR, v_tag = 0xc0864e51 "ufs", v_op = 0xc0926280, 
v_data = 0xc3e5d000, v_mount = 0xc3e56b30, v_nmntvnodes = {tqe_next = 
0xc3d119b4,
   tqe_prev = 0xc3e56b98}, v_un = {vu_mount = 0x0, vu_socket = 0x0, 
vu_cdev = 0x0, vu_fifoinfo = 0x0, vu_yield = 0}, v_hashlist = {le_next 
= 0x0,
   le_prev = 0xc3d09da0}, v_hash = 2, v_cache_src = {lh_first = 0x0}, 
v_cache_dst = {tqh_first = 0x0, tqh_last = 0xc3d11af8}, v_dd = 0x0, 
v_cstart = 0,
 v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = {lk_object = {lo_name 
= 0xc0864e51 "ufs", lo_type = 0xc0864e51 "ufs", lo_flags = 70844416,
 lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 
0x0}}, lk_interlock = 0xc0956510, lk_flags = 262208, lk_sharecount = 0,
   lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 
51, lk_lockholder = 0xc3b31d20, lk_newlock = 0x0}, v_interlock = 
{lock_object = {
 lo_name = 0xc086fb51 "vnode interlock", lo_type = 0xc086fb51 
"vnode interlock", lo_flags = 16973824, lo_witness_data = {lod_list = 
{stqe_next = 0x0},
   lod_witness = 0x0}}, mtx_lock = 3283295520, mtx_recurse = 0}, 
v_vnlock = 0xc3d11b20, v_holdcnt = 2, v_usecount = 0, v_iflag = 0, 
v_vflag = 1,
 v_writecount = 0, v_freelist = {tqe_next = 0x0, tqe_prev = 0x0}, 
v_bufobj = {bo_mtx = 0xc3d11b50, bo_clean = {bv_hd = {tqh_first = 
0xe3d02594,
   tqh_last = 0xe3d025cc}, bv_root = 0xe3d02594, bv_cnt = 1}, 
bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xc3d11b9c}, bv_root 
= 0x0,
 bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_ops = 0xc091ae00, 
bo_bsize = 16384, bo_object = 0xc106183c, bo_synclist = {le_next = 0x0,
 le_prev = 0x0}, bo_private = 0xc3d11ac8, __bo_vnode = 
0xc3d11ac8}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0}

(kgdb) p rootvnode
$4 = (struct vnode *) 0x0
(kgdb) p *rootvnode
Cannot access memory at address 0x0
(kgdb)




Konstantin,

I have tried your patch. It seems like it is working, tried "shutdown -p 
now"
2 times and my RELENG_7 didn't crash after using zfs/geli external HDD 
via USB.
Attached patches are for RELENG_7 (small modifications made in order to 
apply to RELENG_7).


thanks a lot,

Ganbold


--
If you think education is expensive, try ignorance. -- Derek Bok, 
president of Harvard
--- opensolaris_kobj.c~ 2008-04-17 09:23:29.0 +0800
+++ opensolaris_kobj.c  2008-11-24 14:28:01.0 +0800
@@ -67,17 +67,25 @@
 kobj_open_file_vnode(const char *file)
 {
struct thread *td = curthread;
+   struct filedesc *fd;
struct nameidata nd;
int error, flags;
 
-   if (td->td_proc->p_fd->fd_rdir == NULL)
-   td->td_proc->p_fd->fd_rdir = rootvnode;
-   if (td->td_proc->p_fd->fd_cdir == NULL)
-   td->td_proc->p_fd->fd_cdir = rootvnode;
+   fd = td->td_proc->p_fd;
+   FILEDESC_XLOCK(fd);
+   if (fd->fd_rdir == NULL) {
+   fd->fd_rdir = rootvnode;
+   vref(fd->fd_rdir);
+   }
+   if (fd->fd_cdir == NULL) {
+   fd->fd_cdir = rootvnode;
+   vref(fd->fd_cdir);
+   }
+   FILEDESC_XUNLOCK(fd);
 
flags = FREAD;
-   NDINIT(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, file, td);
-   error = vn_open_cred(&nd, &flags, 0, td->td_ucred, NULL);
+   NDINIT(&nd, LOOKUP, MPSAFE, UIO_SYSSPACE, file, td);
+   error = vn_open_cred(&nd, &flags, O_NOFOLLOW, td->td_ucred, NULL);
NDFREE(&nd, NDF_ONLY_PNBUF);
if (error != 0)
return (NULL);
@@ -122,12 +130,15 @@
struct thread *td = curthread;
struct vattr va;
int error;
-
+   int vfslocked;
+ 
+   vfslocked = VFS_LOCK_GIANT(vp->v_mount);
vn_lock(vp, LK_SHARED | LK_RETRY, td);
error = VOP_GETATTR(vp, &va, td->td_ucred, td);
VOP_UNLOCK(vp, 0, td);
if (error == 0)
*size = (uint64_t)va.va_size;
+   VFS_UNLOCK_GIANT(vfslocked);
return (error);
 }
 
@@ -161,6 +172,7 @@
struct uio auio;
struct iovec aiov;
int error;
+   int vfslocked;
 
bzero(&aiov, sizeof(aiov));
bzero(&auio, sizeof(auio));
@@ -176,9 +188,11 @@
auio.uio_resid = size;
auio.uio_td = td;
 
+   vfslocked = VFS_LOCK_GIANT(vp->v_mount);
vn_lock(vp, LK_SHARED | LK_RETRY, td);
error = VOP_READ(vp, &auio, IO_UNIT | IO_SYNC, td->td_ucred);
VOP_UNLOCK(vp, 0, td);
+   VFS_UNLOCK_GIANT(vfslocked);
return (error != 0 ? -1 : size - auio.uio_resid);
 }
 
@@ -213,8 +227,11 @@
struct vnode *vp = file->ptr;
struct thread *td = curthread;
int flags = FREAD;
-
+   int vfslocked;
+ 
+   vfslocked = VFS_LOCK_GIANT(vp->v_mount);
vn_close(vp, flags, td->td_ucred, td);
+   VFS_UNLOCK_GIANT(vfslocked);
}
kmem_free(file, sizeof(*file));
 }
--- vnode.h~2008-04-17 09:23:30.0 +0800
+++ vnode.h 2008-11-

Problem with Adaptec 29320LPE

2008-11-24 Thread Greg Byshenk
Is there a problem with the Adaptec 29320LPE (PCIe x1, single-channel Ultra320)
SCSI controller under FreeBSD 7?

I've recently received a server with this controller, which is intended to be
used to connect to Sony AIT tape libraries for backup. Unfortunately, it does
not seem to function properly.

It sees the connected devices without any difficulty, but fails to write to
any connected drives, and produces very strange errors when attempting to
address the libraries. That is, when attempting to write to a drive, the
drive is seen as present, but any attempt actually to write results in an
error (an end of tape is reported) without any data being written (mt status
reports the tape at File Number 0, Record number 0).

Additionally, attempting to address the changers produces erratic results.
Sometimes, the result is normal, but at other times the results are garbled,
and syslog reports a string of errors from the controller, followed by a
long string of errors on 'ch' (see below).

I am reasonably certain that the errors are not related to the tape
libraries, as a) the libraries worked normally on the old server, and
b) after installing a different controller (Adaptec 29160), the libraries
function properly on the new machine. And I am reasonably sure that the
problem is not a 320/160 problem, as setting the new controller to 160 in
the BIOS does not help.

The system is currently running FreeBSD 7.1-PRERELEASE: Wed Nov 19 11:33:15
CET 2008, from sources csup'ed immediately prior to the build. The kernel
is very close to GENERIC, but with various cardbus, wlan, and usb support
removed.

Searching has indicated some similar-looking errors reported, but all from
rather a long time ago (2000-2002).



backuphost# camcontrol devlist
at scbus0 target 0 lun 0 (pass0,ch3)
   at scbus0 target 1 lun 0 (sa3,pass1)
at scbus0 target 2 lun 0 (pass2,ch4)
   at scbus0 target 3 lun 0 (sa4,pass3)
at scbus1 target 0 lun 0 (da0,pass4)
at scbus1 target 0 lun 1 (da1,pass5)

backuphost# chio -f /dev/ch2 status
picker 0:
slot 0: 
slot 1: 
slot 2: 
slot 3: 
slot 4: 
slot 5: 
slot 6: 
slot 7: 
slot 8: 
slot 9: 
slot 10: 
slot 11: 
slot 12: 
slot 13: 
slot 14: 
slot 15: 
drive 0: 
backuphost# chio -f /dev/ch2 status
picker 0:
slot 8: 
slot 9: 
slot 10: 
slot 11: 
slot 12: 
slot 13: 
slot 14: 
slot 15: 
slot 8: 
slot 9: 
slot 10: 
slot 11: 
slot 12: 
slot 13: 
slot 14: 
slot 0:
drive 0: 
backuphost#


Nov 20 17:53:08 backuphost kernel: ahd0:  port 0x4400-0x44ff,
0x4000-0x40ff mem 0xda60-0xda601fff irq 18 at device 4.0 on pci10
Nov 20 17:53:08 backuphost kernel: ahd0: [ITHREAD]
Nov 20 17:53:08 backuphost kernel: aic7902: Ultra320 Wide Channel A, SCSI Id=7, 
PCI-X 101-133Mhz, 51
2 SCBs
Nov 20 15:01:16 backuphost kernel: ahd0: Transmission error detected
Nov 20 15:01:16 backuphost kernel: LQISTAT1[0x0] LASTPHASE[0x40]:(P_DATAIN) 
SCSISIGI[0x40]:(P_DATAIN
)
Nov 20 15:01:16 backuphost kernel: PERRDIAG[0xd0]:(PARITYERR|HIPERR|HIZERO)
Nov 20 15:01:16 backuphost kernel: >> Dump Card State Begins 
<
Nov 20 15:01:16 backuphost kernel: ahd0: Dumping Card State at program address 
0x3b Mode 0x22
Nov 20 15:01:16 backuphost kernel: Card was paused
Nov 20 15:01:16 backuphost kernel: INTSTAT[0x8]:(SCSIINT) SELOID[0x0] 
SELID[0x10] HS_MAILBOX[0x0]
Nov 20 15:01:16 backuphost kernel: INTCTL[0xc0]:(SWTMINTEN|SWTMINTMASK) 
SEQINTSTAT[0x10]:(SEQ_SWTMRT
O)
Nov 20 15:01:16 backuphost kernel: SAVED_MODE[0x11] 
DFFSTAT[0x19]:(CURRFIFO_1|FIFO0FREE)
Nov 20 15:01:16 backuphost kernel: SCSISIGI[0xb6]:(P_MESGOUT|REQI|BSYI|ATNI) 
SCSIPHASE[0x4]:(MSG_OUT
_PHASE)
Nov 20 15:01:16 backuphost kernel: SCSIBUS[0xc0] LASTPHASE[0x40]:(P_DATAIN) 
SCSISEQ0[0x0]
Nov 20 15:01:16 backuphost kernel: SCSISEQ1[0x12]:(ENAUTOATNP|ENRSELI) 
SEQCTL0[0x0] SEQINTCTL[0x0]
Nov 20 15:01:16 backuphost kernel: SEQ_FLAGS[0x20]:(DPHASE) SEQ_FLAGS2[0x0] 
QFREEZE_COUNT[0x40a]
Nov 20 15:01:16 backuphost kernel: KERNEL_QFREEZE_COUNT[0x40a] 
MK_MESSAGE_SCB[0xff00]
Nov 20 15:01:16 backuphost kernel: MK_MESSAGE_SCSIID[0xff] 
SSTAT0[0x2]:(SPIORDY) SSTAT1[0x11]:(REQIN
IT|PHASEMIS)
Nov 20 15:01:16 backuphost kernel: SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] 
SIMODE1[0xac]:(ENSCSIPERR|E
NBUSFREE|ENSCSIRST|ENSELTIMO)
Nov 20 15:01:16 backuphost kernel: LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] 
LQOSTAT0[0x0]
Nov 20 15:01:16 backuphost kernel: LQOSTAT1[0x0] LQOSTAT2[0x0]
Nov 20 15:01:16 backuphost kernel:
Nov 20 15:01:16 backuphost kernel: SCB Count = 512 CMDS_PENDING = 1 LASTSCB 
0x CURRSCB 0x1ff NEX
TSCB 0x0
Nov 20 15:01:16 backuphost kernel: qinstart = 4230 qinfifonext = 4230
Nov 20 15:01:16 backuphost kernel: QINFIFO:
Nov 20 15:01:16 backuphost kernel: WAITING_TID_QUEUES:
Nov 20 15:01:16 backuphost kernel: Pending list:
Nov 20 15:01:16 backuphost kernel: 511 FIFO_USE[0x0] 
SCB_CONTROL[0x40]:(DISCENB) SCB_SCSIID[0x7]
Nov 20 15:01:16 backuphost kernel: Total 1
Nov 20 15:01:16 backuphost kernel: Ke

Re: Problem with Adaptec 29320LPE

2008-11-24 Thread Rink Springer
Hi Greg,

On Mon, Nov 24, 2008 at 12:42:49PM +0100, Greg Byshenk wrote:
> backuphost# camcontrol devlist
> at scbus0 target 0 lun 0 (pass0,ch3)
>at scbus0 target 1 lun 0 (sa3,pass1)
> at scbus0 target 2 lun 0 (pass2,ch4)
>at scbus0 target 3 lun 0 (sa4,pass3)
> at scbus1 target 0 lun 0 (da0,pass4)
> at scbus1 target 0 lun 1 (da1,pass5)

Are these volumes perhaps >2TB ? If so, it won't work...  we stumbled on
this at work a few weeks ago, and once we resized the volumes so that'd
all be <2TB, the controller worked fine...

As far as I know, this is the only workaround - I couldn't see relevant
patches in Open/NetBSD either that might have fixed this issue :-(

Regards,

-- 
Rink P.W. Springer- http://rink.nu
"Anyway boys, this is America. Just because you get more votes doesn't
 mean you win." - Fox Mulder
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Problem with Adaptec 29320LPE

2008-11-24 Thread Greg Byshenk
On Mon, Nov 24, 2008 at 12:49:12PM +0100, Rink Springer wrote:
> Hi Greg,
> 
> On Mon, Nov 24, 2008 at 12:42:49PM +0100, Greg Byshenk wrote:
> > backuphost# camcontrol devlist
> > at scbus0 target 0 lun 0 (pass0,ch3)
> >at scbus0 target 1 lun 0 (sa3,pass1)
> > at scbus0 target 2 lun 0 (pass2,ch4)
> >at scbus0 target 3 lun 0 (sa4,pass3)
> > at scbus1 target 0 lun 0 (da0,pass4)
> > at scbus1 target 0 lun 1 (da1,pass5)

> Are these volumes perhaps >2TB ? If so, it won't work...  we stumbled on
> this at work a few weeks ago, and once we resized the volumes so that'd
> all be <2TB, the controller worked fine...
> 
> As far as I know, this is the only workaround - I couldn't see relevant
> patches in Open/NetBSD either that might have fixed this issue :-(
 
The volume da1 is indeed >2TB, but it is not connected to the controller;
it (along with da0) is actually a RAID-10 array connected to a 3Ware/AMCC 
SATA controller.  The Adaptec contoller is used only for the tape drives
(the SDX-900V is AIT4; the SDX-1100 is AIT5), and they are <2TB.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Integrated RTL8168/8111 NIC not assigned interface

2008-11-24 Thread hamtilla

I upgraded to 7.1-PRERELEASE and it works now.

Thank you!



Peter C. Lai-2 wrote:
> 
> On 2008-11-22 08:09:31AM +1100, Peter Jeremy wrote:
>> On 2008-Nov-21 00:07:26 -0800, hamtilla <[EMAIL PROTECTED]> wrote:
>> >I'm running 7.0-RELEASE-i386 on Jetway's NC92-N230 mainboard. The board
>> has
>> >one integrated RTL8168/8111 gigabit NIC as well as an expansion board
>> with
>> >three RTL8168/8111 NICs. Why would the three NICs work while the onboard
>> NIC
>> >does not? 
>> >
>> >[EMAIL PROTECTED]:1:0:0:   class=0x02 card=0x816810ec 
>> >chip=0x816810ec
>> >rev=0x02 hdr=0x00
>> >vendor = 'Realtek Semiconductor'
>> >device = 'RTL8168/8111 PCI-E Gigabit Ethernet NIC'
>> >class  = network
>> >subclass   = ethernet
>> >[EMAIL PROTECTED]:2:4:0: class=0x02 card=0x10ec16f3 chip=0x816710ec 
>> >rev=0x10
>> >hdr=0x00
>> >vendor = 'Realtek Semiconductor'
>> >device = 'RTL8169/8110 Family Gigabit Ethernet NIC'
>> >class  = network
>> >subclass   = ethernet
>> ...
>> 
>> The on-board NIC is a different type to your expansion cards (note the
>> different 'chip=' values.  Looking at the code, it appears that only
>> some variants of the RTL8168 are supported in 7.x.  Unfortunately,
>> pciconf
>> doesn't report the actual hardware revision, so you can't tell from the
>> pciconf output whether it's supported or not.
>> 
>> Can you report the output of 'pciconf -r pci0:1:0:0 0x40' (which should
>> report the hw revision) and 'pciconf -r pci0:2:4:0 0x40' (which gives
>> me a double-check).
>> 
>> You could try booting -current and see if the on-board NIC works there -
>> the range of supported NICs has changed.
>> 
>> -- 
>> Peter Jeremy
>> Please excuse any delays as the result of my ISP's inability to implement
>> an MTA that is either RFC2821-compliant or matches their claimed
>> behaviour.
> 
> Yes, 7.0-R is pretty old in terms of re(4) work. I believe yongari@
> is still working on this driver. 7.1 is close enough for patching
> with patches from http://people.freebsd.org/~yongari/re/
> 
> Currently development is stifled because he has to basically guess
> the appropriate magic values for various PHY permutations in these 
> 8111C/8168C gigabit cards everyone seems to be putting in their
> motherboards these days.
> 
> -- 
> ===
> Peter C. Lai | Bard College at Simon's Rock
> Systems Administrator| 84 Alford Rd.
> Information Technology Svcs. | Gt. Barrington, MA 01230 USA
> peter AT simons-rock.edu | (413) 528-7428
> ===
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Integrated-RTL8168-8111-NIC-not-assigned-interface-tp20616760p20662192.html
Sent from the freebsd-stable mailing list archive at Nabble.com.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-11-24 Thread Michael Grant
On Thu, Sep 11, 2008 at 11:56 AM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote:
> On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote:
>> On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote:
>> > On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
>> >> My box crashed again:
>> >>
>> >> panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
>> >> cpuid = 0
>> >> Uptime: 33d11h12m58s
>> >> Dumping 3327 MB (2 chunks)
>> >>   chunk 0: 1MB (151 pages) ... ok
>> >>   chunk 1: 3327MB (851568 pages)  <---hung here
>> >>
>> >> Still no valid dump.
>> >>
>> >> There is 4gig of physical memory in the machine.
>> >>
>> >> In /boot/loader.conf, I currently have the following:
>> >>
>> >> vm.kmem_size=1G
>> >> vm.kmem_size_max=1G
>> >> vm.kmem_size_scale=2
>> >>
>> >> and in my kernel conf file I have:
>> >>
>> >> options KVA_PAGES=512
>> >>
>> >> It stayed up for 33 days this time.  Is there anything else I can do?
>> >
>> > First and foremost: are you using ZFS on this machine?  If so, there are
>> > many tunables you can apply to try and limit this; I'm willing to bet
>> > it's ARC which is doing it.  See below.
>> >
>> > In general, it appears that you need to increase the maximum range of
>> > kmem.  The kernel attempted to utilise more than 1GB, and your limit is
>> > 1G.  My machines running RELENG_7 on amd64, with only 2GB of RAM
>> > installed, use the following tunables in loader.conf:
>> >
>> > vm.kmem_size="1536M"
>> > vm.kmem_size_max="1536M"
>> >
>> > If ZFS is in use, I recommend these as well:
>> >
>> > vfs.zfs.arc_min="16M"
>> > vfs.zfs.arc_max="64M"
>> > vfs.zfs.prefetch_disable="1"
>> >
>> > Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
>> > have in the machine, with regards to RELENG_7, will not help.  This is a
>> > known limitation which has been fixed in HEAD/CURRENT (where the limit
>> > has been increased to 512GB).  See the "Kernel" section below; you'll
>> > see the applicable item.
>> >
>> > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
>> >
>> > Your only solution may be to run HEAD/CURRENT.
>>
>> I am not running ZFS.  My file systems are ufs.
>>
>> This feels like some sort of memory leak in the kernel.  Giving it
>> more and more memory just seems to delay the crash.  Are you saying
>> the crash is fixed in HEAD/CURRENT?
>
> It's an intentional crash, not "the program tried to access NULL, which
> crashed the machine" crash.  The kernel wants more memory to accomplish
> a certain thing, and it's not available.  kris@ can explain this in
> better terms than I can.
>
> First and foremost, it would be good to find out what all you are
> running on this machine (process-wise).  A process could be tickling
> something in the kernel which requires a large amount of memory to be
> required.  I can imagine something like MySQL would require this.
>
> Ideally what needs to happen is to debug the kernel or get a full map
> of kmem to find out what's using what.  I believe vmstat -m or vmstat -z
> output might help.
>
> Obviously since the machine panics, you won't be able to run those
> commands after the fact.  I would recommend you set up a cronjob that
> runs every 1-2 minutes and logs the output of both of those commands
> to a file.  When the panic happens, restart the system and look at
> the logfile to see if you can figure out if anything suddenly starts
> taking up a large amount of memory, or if it's a gradual thing
> (indicating a memory leak).
>
> If you can figure out what might be tickling the problem, you can
> ultimately figure out if increasing kmem is the right thing to do, or if
> there's a greater problem here.
>
>> I'm running 6.3 by the way.
>>
>> I have put your changes into my loader.conf, we'll see how long it
>> goes this time.  I'm not qute in position to update everything to 7.x
>> at the moment.
>
> Our production webservers run RELENG_6 and RELENG_7, and we don't
> encounter this kind of problem.  I'm not saying what you're experiencing
> is indicative of hardware issues or something like that -- I'm simply
> saying I have loaded systems which don't ever hit that condition.  So
> figuring out what's causing it in your case would be good.
>

This appears to be too high as the machine reboots immediately after the fsck:

>> > vm.kmem_size="1536M"
>> > vm.kmem_size_max="1536M"

Returning it to 1G, it panics again about a month later.

Here's vmstat -m and -z roughly 1 minute before it crashed (I was
logging to a file every minute via cron):

Fri Nov 21 15:15:00 EST 2008
 Type InUse MemUse HighUse Requests  Size(s)
  pfs_vncache 2 1K   -   864205  32
 GEOM   16824K   -   416279  16,32,64,128,256,512,1024,2048,4096
   isadev17 2K   -   17  64
   CAM periph 1 1K   -1  128
 cdev26 4K   -   26  128
CAM queue 3 1K   -3  16
file desc   739   4

ext2 inode size patch - RE: PR kern/124621

2008-11-24 Thread Josh Carroll
A while back, I submitted a patch for PR kern/124621, which allows the
mounting of an ext2(3) filesystem created with an inode size other
than 128.  The e2fsprogs' default is now 256, so file systems created
on newer Linux distributions or with the port will not be mountable.

I was hopeful this would get committed in time for 7.1-RELEASE (and
6.4-RELEASE), however the PR remains open.

If there is an issue with the patch itself, I would be glad to fix it.
I'm posting to fs@ because hopefully some folks more experienced with
file system/kernel code can have a look and see if the patch is ok to
commit.

I've seen a few people in ##freebsdhelp on Freenode as well as
#freebsdhelp on EFnet with this problem, and have had them test this
patch out with success (and no obvious adverse effects), so I was
hoping it could committed in time for 7.1-RELEASE. Since 6.4 is so
close to release, I'm not so sure about that.

Anyway, I would appreciate it if the patch could get some review to
see if it can be committed in time.

Regards,
Josh
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


no priority on the console?

2008-11-24 Thread Jo Rhett
As per my previous message, I've spent about 3 months trying to debug  
a problem that was causing all disk I/O to go very slowly.


One of the things which made this nearly impossible to diagnose was  
the absolute lack of priority given to the console.  Logging in on the  
console would take 12-15 minutes.  Hitting enter on the console would  
usually take between 3 and 5 minutes.


This doesn't seem right to me.  Can someone explain why the console  
isn't given a very high priority?  Why not?  What other mechanism does  
the sysadmin have for debugging, at a time when SSH logins either  
fail, or take up to an hour to complete?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


smartd long self-test causes drives to hang

2008-11-24 Thread Jo Rhett
I've spent about 3 months tracing down what was causing my personal  
colo box to start getting "sluggish" right around dawn every Saturday  
morning.  It took so long because some mornings I simply couldn't pull  
my head out of my tail enough to do proper debugging.


The cause was *really slow* filesystem response time.  No cron jobs in  
that period.  No specific process ran any slower than another,  
although I eventually learned that ones which did no file i/o were  
fine.  And finally I realized that just "ls -la" was very slow (~1  
minute) even after I had killed off every disk-using process in the  
system.  SMTP and HTTP in particular were basically fubar.


No data loss, just *real slow*.  Nothing other than a soft reboot ever  
solved the problem.Even leaving it running only minimal processes  
for 24 hours didn't bring it back to normal.


Finally I was browsing through Jeremy Chadwick's list of known ATA  
problems and spotted his comments about smartd self-tests causing  
problems.  Sure enough, my long self test was scheduled for 5am on  
Saturday mornings.  Rechecking the observed slow-down periods  
confirmed that the problem never became visible before 5am.   
(sometimes it took up to 45 minutes before things slowed down enough  
to set off monitoring alarms)


So, long story short, if you're having weirdness in system time  
response - check the smartd configuration, and try disabling the self  
tests.  The short self test I was running daily didn't appear to  
affect anything, but the long test was just bringing the system to  
just shuddering and limping at best.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-11-24 Thread Jo Rhett

This is now filed as PR 129149

http://www.freebsd.org/cgi/query-pr.cgi?pr=129149

Given the nature of this bug, can I persuade someone to mark this as  
blocking 6.4-RELEASE ?


On Nov 5, 2008, at 3:41 PM, Jo Rhett wrote:

On Oct 27, 2008, at 8:51 AM, John Baldwin wrote:

On Friday 24 October 2008 02:48:13 pm Jo Rhett wrote:
So I booted up by CD and used Fixit mode to switch the system to  
boot

via serial (keyboard detached), but this gathered me even less.

/boot.config: -Dh
Consoles: internal video/keyboard  serial port
BIOS drive A: is disk0
BIOS drive C: is disk1
BIOS drive D: is disk2
BIOS 639kB/4062144kB available memory

FreeBSD/i386 bootstrap loader, Revision 1.1
([EMAIL PROTECTED]

Plugging back in the monitor after lockup showed only a single char
more:
([EMAIL PROTECTED]


This confirms it is hanging in one of the two BIOS routines to  
output a
character.  One thing you can do would be to boot up and do the  
following:


dd if=/dev/mem bs=0x400 count=1 of=idt.out
dd if=/dev/mem bs=64k iseek=15 count=1 of=bios.out

Then place those files some place I can fetch them.


Both files are at http://support.netconsonance.com/freebsd/

FYI, this is notable -- the keyboard does not respond at the boot  
prompt.  I mean the menu where you can escape to the loader prompt,  
with the fat freebsd ascii art.  No keyboard presses are observed  
here.  This is also true for the boot menu on the 6.4 installation  
CD too.


No problems with 6.2 or 6.3

--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: smartd long self-test causes drives to hang

2008-11-24 Thread Jo Rhett
On re-reading the message I realized that my message was in danger of  
being content-free.


gmirror whole-disk mirror of seagate 300gb drives

$ atacontrol list
ATA channel 0:
Master:  ad0  ATA/ATAPI revision 7
Slave:   ad1  ATA/ATAPI revision 7

$ gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 575427344
Providers:
1. Name: mirror/gm0
   Mediasize: 300069051904 (279G)
   Sectorsize: 512
   Mode: r5w5e6
Consumers:
1. Name: ad0
   Mediasize: 300069052416 (279G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 3917165570
2. Name: ad1
   Mediasize: 300069052416 (279G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 3874187635


On Nov 24, 2008, at 12:48 PM, Jo Rhett wrote:
I've spent about 3 months tracing down what was causing my personal  
colo box to start getting "sluggish" right around dawn every  
Saturday morning.  It took so long because some mornings I simply  
couldn't pull my head out of my tail enough to do proper debugging.


The cause was *really slow* filesystem response time.  No cron jobs  
in that period.  No specific process ran any slower than another,  
although I eventually learned that ones which did no file i/o were  
fine.  And finally I realized that just "ls -la" was very slow (~1  
minute) even after I had killed off every disk-using process in the  
system.  SMTP and HTTP in particular were basically fubar.


No data loss, just *real slow*.  Nothing other than a soft reboot  
ever solved the problem.Even leaving it running only minimal  
processes for 24 hours didn't bring it back to normal.


Finally I was browsing through Jeremy Chadwick's list of known ATA  
problems and spotted his comments about smartd self-tests causing  
problems.  Sure enough, my long self test was scheduled for 5am on  
Saturday mornings.  Rechecking the observed slow-down periods  
confirmed that the problem never became visible before 5am.   
(sometimes it took up to 45 minutes before things slowed down enough  
to set off monitoring alarms)


So, long story short, if you're having weirdness in system time  
response - check the smartd configuration, and try disabling the  
self tests.  The short self test I was running daily didn't appear  
to affect anything, but the long test was just bringing the system  
to just shuddering and limping at best.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-11-24 Thread Xin LI
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jo Rhett wrote:
> This is now filed as PR 129149
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=129149
> 
> Given the nature of this bug, can I persuade someone to mark this as
> blocking 6.4-RELEASE ?

My wild guess is that this is somehow related to SMP handling since the
installation process would install a SMP kernel, but the default CD-ROM
kernel is UP for 6.x.  Could you please try if you have the same problem
with UP kernel?  (Copy from LiveCD or something)

> On Nov 5, 2008, at 3:41 PM, Jo Rhett wrote:
>> On Oct 27, 2008, at 8:51 AM, John Baldwin wrote:
>>> On Friday 24 October 2008 02:48:13 pm Jo Rhett wrote:
 So I booted up by CD and used Fixit mode to switch the system to boot
 via serial (keyboard detached), but this gathered me even less.

 /boot.config: -Dh
 Consoles: internal video/keyboard  serial port
 BIOS drive A: is disk0
 BIOS drive C: is disk1
 BIOS drive D: is disk2
 BIOS 639kB/4062144kB available memory

 FreeBSD/i386 bootstrap loader, Revision 1.1
 ([EMAIL PROTECTED]

 Plugging back in the monitor after lockup showed only a single char
 more:
 ([EMAIL PROTECTED]
>>>
>>> This confirms it is hanging in one of the two BIOS routines to output a
>>> character.  One thing you can do would be to boot up and do the
>>> following:
>>>
>>> dd if=/dev/mem bs=0x400 count=1 of=idt.out
>>> dd if=/dev/mem bs=64k iseek=15 count=1 of=bios.out
>>>
>>> Then place those files some place I can fetch them.
>>
>> Both files are at http://support.netconsonance.com/freebsd/
>>
>> FYI, this is notable -- the keyboard does not respond at the boot
>> prompt.  I mean the menu where you can escape to the loader prompt,
>> with the fat freebsd ascii art.  No keyboard presses are observed
>> here.  This is also true for the boot menu on the 6.4 installation CD
>> too.
>>
>> No problems with 6.2 or 6.3
>>
>> -- 
>> Jo Rhett
>> Net Consonance : consonant endings by net philanthropy, open source
>> and other randomness
>>
>>
>> ___
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"


- --
Xin LI <[EMAIL PROTECTED]>  http://www.delphij.net/
FreeBSD - The Power to Serve!
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkkrIc8ACgkQi+vbBBjt66BVUACcDLDK7Ubugt2sto8WKAYfxF0L
93cAoI3bJ/7YcKQeVUmWTO9R2tOCOf6W
=dEk9
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-11-24 Thread Jo Rhett
So boot from CD, go to LIVE filesystem, mount my root and copy only / 
boot/kernel?


Are there any other modules I should copy, or settings I should change?

On Nov 24, 2008, at 1:51 PM, Xin LI wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jo Rhett wrote:

This is now filed as PR 129149

   http://www.freebsd.org/cgi/query-pr.cgi?pr=129149

Given the nature of this bug, can I persuade someone to mark this as
blocking 6.4-RELEASE ?


My wild guess is that this is somehow related to SMP handling since  
the
installation process would install a SMP kernel, but the default CD- 
ROM
kernel is UP for 6.x.  Could you please try if you have the same  
problem

with UP kernel?  (Copy from LiveCD or something)


On Nov 5, 2008, at 3:41 PM, Jo Rhett wrote:

On Oct 27, 2008, at 8:51 AM, John Baldwin wrote:

On Friday 24 October 2008 02:48:13 pm Jo Rhett wrote:
So I booted up by CD and used Fixit mode to switch the system to  
boot

via serial (keyboard detached), but this gathered me even less.

/boot.config: -Dh
Consoles: internal video/keyboard  serial port
BIOS drive A: is disk0
BIOS drive C: is disk1
BIOS drive D: is disk2
BIOS 639kB/4062144kB available memory

FreeBSD/i386 bootstrap loader, Revision 1.1
([EMAIL PROTECTED]

Plugging back in the monitor after lockup showed only a single  
char

more:
([EMAIL PROTECTED]


This confirms it is hanging in one of the two BIOS routines to  
output a

character.  One thing you can do would be to boot up and do the
following:

dd if=/dev/mem bs=0x400 count=1 of=idt.out
dd if=/dev/mem bs=64k iseek=15 count=1 of=bios.out

Then place those files some place I can fetch them.


Both files are at http://support.netconsonance.com/freebsd/

FYI, this is notable -- the keyboard does not respond at the boot
prompt.  I mean the menu where you can escape to the loader prompt,
with the fat freebsd ascii art.  No keyboard presses are observed
here.  This is also true for the boot menu on the 6.4 installation  
CD

too.

No problems with 6.2 or 6.3

--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source
and other randomness


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"



- --
Xin LI <[EMAIL PROTECTED]>http://www.delphij.net/
FreeBSD - The Power to Serve!
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkkrIc8ACgkQi+vbBBjt66BVUACcDLDK7Ubugt2sto8WKAYfxF0L
93cAoI3bJ/7YcKQeVUmWTO9R2tOCOf6W
=dEk9
-END PGP SIGNATURE-


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Can I get a committer to mark this bug as blocking 6.4-RELEASE ?

2008-11-24 Thread Xin LI
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jo Rhett wrote:
> So boot from CD, go to LIVE filesystem, mount my root and copy only
> /boot/kernel?

Yes.

> Are there any other modules I should copy, or settings I should change?

You should probably overwrite the whole /boot/kernel directory, i.e.
rename /boot/kernel to /boot/kernel.old.

BTW could you also test if 7.1-PRERELEASE exhibit the same issue?

> On Nov 24, 2008, at 1:51 PM, Xin LI wrote:
> Jo Rhett wrote:
 This is now filed as PR 129149

http://www.freebsd.org/cgi/query-pr.cgi?pr=129149

 Given the nature of this bug, can I persuade someone to mark this as
 blocking 6.4-RELEASE ?
> 
> My wild guess is that this is somehow related to SMP handling since the
> installation process would install a SMP kernel, but the default CD-ROM
> kernel is UP for 6.x.  Could you please try if you have the same problem
> with UP kernel?  (Copy from LiveCD or something)
> 
 On Nov 5, 2008, at 3:41 PM, Jo Rhett wrote:
> On Oct 27, 2008, at 8:51 AM, John Baldwin wrote:
>> On Friday 24 October 2008 02:48:13 pm Jo Rhett wrote:
>>> So I booted up by CD and used Fixit mode to switch the system to boot
>>> via serial (keyboard detached), but this gathered me even less.
>>>
>>> /boot.config: -Dh
>>> Consoles: internal video/keyboard  serial port
>>> BIOS drive A: is disk0
>>> BIOS drive C: is disk1
>>> BIOS drive D: is disk2
>>> BIOS 639kB/4062144kB available memory
>>>
>>> FreeBSD/i386 bootstrap loader, Revision 1.1
>>> ([EMAIL PROTECTED]
>>>
>>> Plugging back in the monitor after lockup showed only a single char
>>> more:
>>> ([EMAIL PROTECTED]
>>
>> This confirms it is hanging in one of the two BIOS routines to
>> output a
>> character.  One thing you can do would be to boot up and do the
>> following:
>>
>> dd if=/dev/mem bs=0x400 count=1 of=idt.out
>> dd if=/dev/mem bs=64k iseek=15 count=1 of=bios.out
>>
>> Then place those files some place I can fetch them.
>
> Both files are at http://support.netconsonance.com/freebsd/
>
> FYI, this is notable -- the keyboard does not respond at the boot
> prompt.  I mean the menu where you can escape to the loader prompt,
> with the fat freebsd ascii art.  No keyboard presses are observed
> here.  This is also true for the boot menu on the 6.4 installation CD
> too.
>
> No problems with 6.2 or 6.3
>
> -- 
> Jo Rhett
> Net Consonance : consonant endings by net philanthropy, open source
> and other randomness
>
>
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "[EMAIL PROTECTED]"

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to
 "[EMAIL PROTECTED]"
> 
> 

- --
Xin LI <[EMAIL PROTECTED]>  http://www.delphij.net/
FreeBSD - The Power to Serve!
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkkrKMoACgkQi+vbBBjt66AARgCbBHYl8WpX4jjoJrRbrKjJUMPg
lvsAnRlA6be6C62yQNrmNdLhWbOsCBAF
=DiYt
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RELENG_7 panic under load: vm_page_unwire: invalid wire count: 0

2008-11-24 Thread Anton Yuzhaninov

Box with fresh RELENG_7 panic under heavy network load (more than 50k 
connections).

This panics seems to be senfile(2) related, because when sendfile disabled in 
nginx, I can't reproduce the problem.

Backtrace in all cases like this:

# kgdb kernel /spool/crash/vmcore.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: vm_page_unwire: invalid wire count: 0
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x182
vm_page_unwire() at vm_page_unwire+0x84
sf_buf_mext() at sf_buf_mext+0x3c
mb_free_ext() at mb_free_ext+0x99
sbdrop_internal() at sbdrop_internal+0x1e8
tcp_do_segment() at tcp_do_segment+0x1512
tcp_input() at tcp_input+0x7f7
ip_input() at ip_input+0xa8
ether_demux() at ether_demux+0x1b4
ether_input() at ether_input+0x1bb
bge_intr() at bge_intr+0x3ca
ithread_loop() at ithread_loop+0x180
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xea28fd30, rbp = 0 ---
Uptime: 36m47s
Physical memory: 4087 MB
Dumping 708 MB: 693 677 661 645 629 613 597 581 565 549 533 517 501 485 469 453 
437 421 405 389 373 357 341 325 309 293 277 261 245 229 213 197 181 165 149 133 
117 101 85 69 53 37 21 5

#0  doadump () at pcpu.h:195
195 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0x8031adf8 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:418
#2  0x8031b25c in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0x8044a084 in vm_page_unwire (m=Variable "m" is not available.
) at /usr/src/sys/vm/vm_page.c:1410
#4  0x80379a4c in sf_buf_mext (addr=Variable "addr" is not available.
) at /usr/src/sys/kern/uipc_syscalls.c:1720
#5  0x8036e9c9 in mb_free_ext (m=0xff0081f93d00) at 
/usr/src/sys/kern/uipc_mbuf.c:257
#6  0x80372c38 in sbdrop_internal (sb=0xff00b4161458, len=2896) at 
mbuf.h:515
#7  0x803d6532 in tcp_do_segment (m=0xff0075c23b00, 
th=0xff0075c53024, so=0xff00b41612d0,
tp=0xff00b4154b60, drop_hdrlen=52, tlen=0) at 
/usr/src/sys/netinet/tcp_input.c:2042
#8  0x803d7bc7 in tcp_input (m=0xff0075c23b00, off0=20) at 
/usr/src/sys/netinet/tcp_input.c:846
#9  0x803cf108 in ip_input (m=0xff0075c23b00) at 
/usr/src/sys/netinet/ip_input.c:665
#10 0x803b8004 in ether_demux (ifp=0xff0001255800, 
m=0xff0075c23b00) at /usr/src/sys/net/if_ethersubr.c:834
#11 0x803b825b in ether_input (ifp=0xff0001255800, 
m=0xff0075c23b00) at /usr/src/sys/net/if_ethersubr.c:692
#12 0x801bcf5a in bge_intr (xsc=Variable "xsc" is not available.
) at /usr/src/sys/dev/bge/if_bge.c:3160
#13 0x802fb5f0 in ithread_loop (arg=0xff0003711840) at 
/usr/src/sys/kern/kern_intr.c:1088
#14 0x802f7f7f in fork_exit (callout=0x802fb470 , 
arg=0xff0003711840,
frame=0xea28fc80) at /usr/src/sys/kern/kern_fork.c:804
#15 0x8045b88e in fork_trampoline () at 
/usr/src/sys/amd64/amd64/exception.S:455
#16 0x in ?? ()
#17 0x in ?? ()
#18 0x0001 in ?? ()

in /boot/loader.conf I have:

vm.kmem_size=1536M

# 2 Mb KVA/kmem
net.inet.tcp.tcbhashsize=131072
# 64M KVA
kern.maxbcache=64M
# 4M KVA
kern.ipc.maxpipekva=4M
#
net.inet.tcp.syncache.hashsize=1024
net.inet.tcp.syncache.bucketlimit=100

in /etc/sysctl.conf

# 576 Mb KVA/kmem
kern.ipc.nmbclusters=262144

kern.ipc.nmbjumbop=65536
kern.ipc.maxsockets=307200
kern.ipc.somaxconn=4096
kern.maxfiles=307200
kern.maxfilesperproc=102400

$ sysctl vm.kvm_free
vm.kvm_free: 327151616

netstat -m output, several seconds before panic:

380270/63895/444165 mbufs in use (current/cache/total)
14141/29273/43414/262144 mbuf clusters in use (current/cache/total/max)
14141/29251 mbuf+clusters out of packet secondary zone in use (current/cache)
0/9/9/65536 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
123349K/74555K/197905K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
1 requests for I/O initiated by sendfile
0 calls to protocol drain routines

--
 Anton Yuzhaninov
___
freebsd-stable@freebsd.org mailing list
ht

Re: MFC ZFS: when?

2008-11-24 Thread Robert Watson

On Fri, 21 Nov 2008, Zaphod Beeblebrox wrote:

In several of the recent ZFS posts, multiple people have asked when this 
will be MFC'd to 7.x.  This query has been studiously ignored as other 
chatter about whatever ZFS issue is discussed.


Presumably the MFC schedule is largely up to Pawel, who did the work. 
However, Pawel was on travel last weekend and week attending MeetBSD and the 
FreeBSD developer summit in the bay area, and hasn't been seen on stable@ 
since the 17th.  I think it's likely not so much that anyone is being 
studiously ignored, it's that the person who can best answer he question 
hasn't been keeping up with the list for a bit.


Robert N M Watson
Computer Laboratory
University of Cambridge



So in a post with no other bug report or discussion content to distract us,
when is it intended that ZFS be MFC'd to 7.x?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: MFC ZFS: when?

2008-11-24 Thread Andrew Snow


The problem appears to be that the latest ZFS commit in 8-CURRENT relies 
on too many other new features that aren't in 7.1.


After 7.1 is released, then perhaps ZFS and the other new code it 
requires can be moved into 7-STABLE?


- Andrew

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: RELENG_7 panic under load: vm_page_unwire: invalid wire count: 0

2008-11-24 Thread Anton Yuzhaninov

On 25.11.2008 01:48, Anton Yuzhaninov wrote:
Box with fresh RELENG_7 panic under heavy network load (more than 50k 
connections).


This panics seems to be senfile(2) related, because when sendfile 
disabled in nginx, I can't reproduce the problem.


Backtrace in all cases like this:

# kgdb kernel /spool/crash/vmcore.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you 
are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: vm_page_unwire: invalid wire count: 0
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x182
vm_page_unwire() at vm_page_unwire+0x84
sf_buf_mext() at sf_buf_mext+0x3c
mb_free_ext() at mb_free_ext+0x99
sbdrop_internal() at sbdrop_internal+0x1e8
tcp_do_segment() at tcp_do_segment+0x1512
tcp_input() at tcp_input+0x7f7
ip_input() at ip_input+0xa8
ether_demux() at ether_demux+0x1b4
ether_input() at ether_input+0x1bb
bge_intr() at bge_intr+0x3ca
ithread_loop() at ithread_loop+0x180
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xea28fd30, rbp = 0 ---
Uptime: 36m47s
Physical memory: 4087 MB
Dumping 708 MB: 693 677 661 645 629 613 597 581 565 549 533 517 501 485 
469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 213 197 
181 165 149 133 117 101 85 69 53 37 21 5


#0  doadump () at pcpu.h:195
195 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0x8031adf8 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:418

#2  0x8031b25c in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0x8044a084 in vm_page_unwire (m=Variable "m" is not available.
) at /usr/src/sys/vm/vm_page.c:1410
#4  0x80379a4c in sf_buf_mext (addr=Variable "addr" is not 
available.

) at /usr/src/sys/kern/uipc_syscalls.c:1720
#5  0x8036e9c9 in mb_free_ext (m=0xff0081f93d00) at 
/usr/src/sys/kern/uipc_mbuf.c:257


May be it is wire_count integer overflow?

wire_count type is u_short...

--
 Anton Yuzhaninov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


FreeBSD 7.0-STABLE Jul 23: panic: ffs_blkfree: freeing free frag

2008-11-24 Thread Scott Lambert
I have a box I am using for hosting jailed web servers.  I did a
test move of a jail from a FreeBSD 6 box to the FreeBSD 7 server,
web1.hosting.  It took forever, 30 minutes to be exact, to create the
jail with the 3GB image file and restore the data from the FreeBSD 6 box
into it.

I created a test archive of the running jail with ezjail-admin on the
FreeBSD 6 box and scp'd it to web1.hosting.  That took 5 minutes.  (I
was timing all of this to estimate how long it would really take later.)

Once the archive was on web1.hosting, I created the new jail using the
archive to populate it.

sudo ezjail-admin create -a test_host_tcworks_net-200811241856.40.tar.gz \
  -s 3G -i testhost.tcworks.net 192.168.1.238

That step took 40 minutes.  According to 'systat -vm 1', da0 tended to
show around 90% utilization, da1 was about 23% and MB/s was about 1.6
for both during the creation of the jail.

After about 20 to 40 minutes of ensuring that the jail was working
properly with the compat6x libs, I decided to erase the test jail and
get ready for doing the transfer for real during the next maintenance
window.

Just before the box stopped responding to me, I had run: 
sudo ezjail-admin delete -w testhost.tcworks.net 

It might have been about 30 seconds after that I noticed it wasn't
responding.

According to Nagios, it took about 25 minutes to panic, reboot, fsck and
come back up.  Funny, it felt a lot longer.

The gmirror is currently degraded and 'systat -vm 1' is showing 98%
utilization on da0 and 23% utilization on da1 with 35 to 50MB/s on both
da0 and da1.  I hadn't looked at the mirror status before the crash.

21:42:09 Mon Nov 24 $ gmirror status
  NameStatus  Components
mirror/gm0  DEGRADED  da0 (84%)
  da1

I think I'll wait for it to complete the rebuild before I put any disk
load on it looking for when it degraded, if not during the crash.

21:53:34 Mon Nov 24 # gmirror status
  NameStatus  Components
mirror/gm0  COMPLETE  da0
  da1

The disks show to be quiet in systat, as expected.

I don't find any messages except for when it booted up.  I think the
mirror was whole before the crash.  The console log files go back to
July 21 2008.  The messages log files only go back to Nov 22.  I need to
fix that.

The syslog messages about gm0, the kgdb output, and /var/run/dmesg.boot
are below.  If you want anything else, please let me know.

22:15:02 Mon Nov 24 $ gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 149269652
Providers:
1. Name: mirror/gm0
   Mediasize: 146815737344 (137G)
   Sectorsize: 512
   Mode: r6w6e7
Consumers:
1. Name: da0
   Mediasize: 146815737856 (137G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 779766152
2. Name: da1
   Mediasize: 146815737856 (137G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 1224070577

21:58:06 Mon Nov 24 $ sudo cat /var/log/console.log | grep gm0
Nov 24 21:01:23 web1 kernel: kernel dumps on /dev/mirror/gm0s1b
Nov 24 21:01:23 web1 kernel: swapon: adding /dev/mirror/gm0s1b as swap device
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1a: 3505 files, 133813 used, 
120002 free (2498 frags, 14688 blocks, 1.0% fragmentation)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=967747  
OWNER=root MODE=100644
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=1073741824 MTIME=Oct 15 
18:51 2008  (CLEARED)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=967754  
OWNER=root MODE=100644
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=1073741824 MTIME=Oct 15 
19:02 2008  (CLEARED)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=1978373  
OWNER=root MODE=100644
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=3221225472 MTIME=Nov 24 
20:42 2008  (CLEARED)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: ZERO LENGTH DIR I=1978491  
OWNER=root MODE=40755
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=0 MTIME=Oct 28 18:38 2008 
 (CLEARED)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=1978492  
OWNER=root MODE=100644
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=1073741824 MTIME=Oct 15 
18:58 2008  (CLEARED)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=2596868  
OWNER=root MODE=100644
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=0 MTIME=Nov 22 02:54 2008 
 (CLEARED)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=3109973  
OWNER=mysql MODE=100600
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=0 MTIME=Nov 22 02:54 2008 
 (CLEARED)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=3109974  
OWNER=mysql MODE=100600
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: SIZE=0 MTIME=Nov 22 02:54 2008 
 (CLEARED)
Nov 24 21:01:23 web1 kernel: /dev/mirror/gm0s1g: UNREF FILE I=3109975  
OWNE

ioctl DIOCSMBR: Inappropriate ioctl for device

2008-11-24 Thread Rajkumar S
Hi,

I am working on a nanobsd derived system for updating an embedded
pfSense image. The disk is partitioned into 4 partitions with 2
similar "code" partitions. One of the two code partition is live at
any moment. To update the partition image is written to the other
partition and a command like boot0cfg -s 2 -v ad2 to boot to the new
partition.

Instead of using device names I am using bsdlabel and refer the disks
using the label in fdisk.

Current partitions are as follows:

nanoimg:~#  fdisk ad2
*** Working on device /dev/ad2 ***
parameters extracted from in-core disklabel are:
cylinders=1999 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=1999 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 32, size 239584 (116 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 467/ head 15/ sector 32
The data for partition 2 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 239648, size 239584 (116 Meg), flag 0
beg: cyl 468/ head 1/ sector 1;
end: cyl 935/ head 15/ sector 32
The data for partition 3 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 479232, size 2048 (1 Meg), flag 0
beg: cyl 936/ head 0/ sector 1;
end: cyl 939/ head 15/ sector 32
The data for partition 4 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 481280, size 20480 (10 Meg), flag 0
beg: cyl 940/ head 0/ sector 1;
end: cyl 979/ head 15/ sector 32

dmesg shows the following when booting:

ad2: 983MB  at ata1-master PIO4
GEOM: ad2: partition 4 does not start on a track boundary.
GEOM: ad2: partition 4 does not end on a track boundary.
GEOM: ad2: partition 3 does not start on a track boundary.
GEOM: ad2: partition 3 does not end on a track boundary.
GEOM: ad2: partition 2 does not start on a track boundary.
GEOM: ad2: partition 2 does not end on a track boundary.
GEOM: ad2: partition 1 does not start on a track boundary.
GEOM: ad2: partition 1 does not end on a track boundary.
GEOM_LABEL: Label for provider ad2s3 is ufs/cfg.
GEOM_LABEL: Label for provider ad2s4 is ufs/cf.
GEOM_LABEL: Label for provider ad2s1a is ufs/root0.
GEOM_LABEL: Label for provider ad2s2a is ufs/root1.
Trying to mount root from ufs:/dev/ufs/root0

Fstab is:

/dev/ufs/root0 / ufs ro 1 1
/dev/ufs/cfg /cfg ufs rw,noauto 2 2
/dev/ufs/cf /cf ufs ro 1 1

both ad2s1a and ad2s2a are active and they appear in boot screen as F1
and F2. I can manually press F1 and F2 and boot from either of them.
But when I give a command  boot0cfg -s 1 -v ad2 I get

boot0cfg: /dev/ad2: Class not found
boot0cfg: /dev/ad2: ioctl DIOCSMBR: Inappropriate ioctl for device

I have searched google and archives and could not find much about this
error. Any help to resolve this would be much appreciated.

with regards,

raj
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


nfs unreachable: can't get /dev/console for controlling terminal

2008-11-24 Thread Martin

Hi,

besides the wrong order of initializing syslogd in rc system when
IPv6 has been enabled, I have found a second similar problem with the rc
system on my client desktops.

When you physically detach your NIC or make wireless access point
inaccessible on which you have an nfs mounted file system (in fstab).
The system will prevent you to get access to /dev/console and won't
even start in single user mode.

This is extremely annoying.

Nov 25 07:58:11 zelda init: /bin/sh on /etc/rc terminated abnormally,
going to single user mode
Nov 25 07:58:11 zelda init: can't get /dev/console for controlling
terminal: Operation not permitted
Nov 25 07:58:42 zelda init: can't get /dev/console for controlling
terminal: Operation not permitted
Nov 25 08:00:13 zelda last message repeated 3 times
Nov 25 08:01:15 zelda last message repeated 2 times

--
Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"