Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks

2003-06-19 Thread Don Lewis
On 19 Jun, Stefan Eßer wrote: > On 2003-06-18 20:41 -0700, Don Lewis <[EMAIL PROTECTED]> wrote: >> On 18 Jun, Chris Shenton wrote: >> > Don Lewis <[EMAIL PROTECTED]> writes: >> > >> >> Try the very untested patch below ... [ snip ]

patch to let witness monitor the mtx pool

2003-06-23 Thread Don Lewis
I've been running with the patch below for a little while now. It helped me find a situation where a thread attemped to grab a "pool mutex" while it already held one, which I suspect could have caused a deadlock in certain circumstances. In any case, this was illegal because these mutexes are onl

CFR: patch to support creation of multiple mutex pools

2003-07-09 Thread Don Lewis
The patch below enhances the mutex pool code to support the creation and use of multiple mutex pools. It creates one pool of sleep mutexes with the MTX_NOWITNESS flag for use in building higher level (sx and lockmgr) locks. It also creates another pool without MTX_NOWITNESS for general purpose use

Re: CFR: patch to support creation of multiple mutex pools

2003-07-09 Thread Don Lewis
On 9 Jul, I wrote: > The patch below enhances the mutex pool code to support the creation and > use of multiple mutex pools. It creates one pool of sleep mutexes with > the MTX_NOWITNESS flag for use in building higher level (sx and lockmgr) > locks. It also creates another pool without MTX_NOWIT

Re: Deadlock

2003-07-09 Thread Don Lewis
On 9 Jul, Peter Holm wrote: > Here's a trace from a deadlock in a kernel from Jul 8 13:51 UTC: > > http://people.freebsd.org/~pho/stress/cons36.html It sure looks like a mutex implementation problem. Process 4616 is waiting on pool mutex c05dd04c, but this process doesn't show up on the blocked

Re: Kernel built with new GCC panics immediately

2003-07-11 Thread Don Lewis
On 11 Jul, Shizuka Kudo wrote: > > --- Lukas Ertl <[EMAIL PROTECTED]> wrote: >> Hi there, >> >> just wanted to report that a kernel built with the new gcc panics >> immediately when booting. I've seen this on two machines. Panic and reboot >> happens fast that I couldn't get the panic message. >

Re: Kernel built with new GCC panics immediately

2003-07-11 Thread Don Lewis
On 11 Jul, Lukas Ertl wrote: > On Fri, 11 Jul 2003, Alexander Kabaev wrote: > >> Out of curiosity: do you have any non-standard CPUTYPE set? > > No, but I have only "cpu I686_CPU" in my kernel config (worked fine all > the time). > > The panics I get are the same as those from the others, and yo

sporadic disk syncing failures when shutting down

2003-07-12 Thread Don Lewis
I've been updating my current system a lot recently, and twice in the last couple of weeks, the disks have not been properly synced before the system reboots. I've been doing the usual make buildworld make buildkernel make installkernel shutdown -r now make

Re: sporadic disk syncing failures when shutting down

2003-07-12 Thread Don Lewis
On 13 Jul, Jeff Walters wrote: > On Saturday 12 July 2003 11:24 pm, Sean Kelly wrote: > >> > syncing disks, buffers remaining... 54 54 54 54 54 54 54 54 54 54 54 54 >> > 54 54 54 54 54 54 54 54 giving up on 54 buffers >> > Uptime: 6m42s >> > Terminate ACPI >> > Rebooting... >> > >> > Each time thi

Re: Suggested fixes for uidinfo "would sleep" messages

2002-06-18 Thread Don Lewis
On 18 Jun, Alfred Perlstein wrote: > * Nate Lawson <[EMAIL PROTECTED]> [020618 12:17] wrote: >> As with others on the list, I've been getting a lot of witness complaints: >> >> ../../../vm/uma_core.c:1327: could sleep with "process lock" locked from >> ../../../kern/kern_prot.c:511 >> ../../../vm

Re: Removing perl in make world

2002-07-05 Thread Don Lewis
On 6 Jul, Paul Richards wrote: > Let's start with a premise: No-one running current is using it for > anything other than developing FreeBSD. > > Given that premise, then there shouldn't be anything in /usr outside of > /usr/local, that wasn't put there by make world. Likewise the same > should

Re: dump(8) is hosed

2002-07-06 Thread Don Lewis
On 5 Jul, Georg-W. Koltermann wrote: > Am Mi, 2002-07-03 um 17.31 schrieb David O'Brien: >> On a 27-June-2002 23:02:00 UTC system (just before ipfw2 went in, >> pre-KSE3), dump will not complete dumping more than 5GB. At that point >> it stops responding properly to ^T, which should give "DUMP:

Re: dump(8) is hosed

2002-07-06 Thread Don Lewis
On 5 Jul, Georg-W. Koltermann wrote: > Am Mi, 2002-07-03 um 17.31 schrieb David O'Brien: >> On a 27-June-2002 23:02:00 UTC system (just before ipfw2 went in, >> pre-KSE3), dump will not complete dumping more than 5GB. At that point >> it stops responding properly to ^T, which should give "DUMP:

Re: cvs commit: src/sys/tools vnode_if.awk

2002-07-07 Thread Don Lewis
On 7 Jul, Jeff Roberson wrote: > On Sat, 6 Jul 2002, Jeff Roberson wrote: >>- Use 'options DEBUG_VFS_LOCKS' instead of the DEBUG_ALL_VFS_LOCKS >> environment variable to enable the lock verifiction code. > This was previously disabled because our locking was so bad that we could > not

sshd is complaining about /var/log/lastlog permission

2002-07-07 Thread Don Lewis
Sshd on my current box is logging messsages about "sshd[pid]: /var/log/lastlog: permission denied" on my recently updated -current box. The permission on this file are the defaults. Could this be a side effect of the new privilege separation stuff? To Unsubscribe: send mail to [EMAIL PROTECTED

Re: dump(8) is hosed

2002-07-07 Thread Don Lewis
On 7 Jul, Ian Dowse wrote: > In message <[EMAIL PROTECTED]>, Don Lewis writes: >> >>I was finally finally able to reproduce this by creating a large file >>before doing the dump. Dump(8) is *very* hosed. The UFS2 import broke >>it's ability to fol

"pipe mutex" vs. "sigio lock" lock order reversal

2002-07-07 Thread Don Lewis
This error showed up in my logs this morning while I was building some ports on a uni-processor box. I'm running a version of -current from July 7 about 1 AM PDT. Jul 7 07:47:09 scratch kernel: lock order reversal Jul 7 07:47:09 scratch kernel: 1st 0xcabf7980 pipe mutex (pipe mutex) @ /usr/src

ps fails to build with -Werror

2002-07-07 Thread Don Lewis
This should be lots of fun for someone to fix ... ===> bin/ps cc -O -pipe -DLAZY_PS -Werror -Wall -Wno-format-y2k -Wno-uninitialized -Wformat=2 -Wno-format-extra-args -Werror -c /usr/src/bin/ps/fmt.c cc -O -pipe -DLAZY_PS -Werror -Wall -Wno-format-y2k -Wno-uninitialized -Wformat=2 -Wno-f

Re: cvs commit: src/sys/tools vnode_if.awk

2002-07-07 Thread Don Lewis
On 7 Jul, Jeff Roberson wrote: > > > On Sun, 7 Jul 2002, Don Lewis wrote: >> Debugger(c0420fe4) at Debugger+0x45 >> vn_rdwr(0,c6737800,c6425000,55ac,0,0,1,8,c22c7200,df241aec,c22cc0c0) at >> vn_rdwr+0x18d >> linker_hints_lookup(c04750a0,c,c62df000,5,0

struct stat and _POSIX_SOURCE

2002-07-08 Thread Don Lewis
Building OpenOffice is broken in -current because of a problem in . If _POSIX_SOURCE is defined, does not #include to get the definition of struct timespec, and it substitutes alternate structure members for the struct timespec members. Unfortunately it still attempts to pad the structure with

Re: cvs commit: src/sys/tools vnode_if.awk

2002-07-08 Thread Don Lewis
On 7 Jul, Jeff Roberson wrote: > On Sat, 6 Jul 2002, Jeff Roberson wrote: >> Log: >>- Use 'options DEBUG_VFS_LOCKS' instead of the DEBUG_ALL_VFS_LOCKS >> environment variable to enable the lock verifiction code. > If you have a crash test box I would appreciate it if you would enable

Re: KSE M-III status & junior hacker project.

2002-07-08 Thread Don Lewis
On 8 Jul, Anthony Jenkins wrote: > I've been looking at the pcm code and I can see where it locks, then allocates > memory with the M_WAITOK flag thing. I'm wondering if there's a standard > procedure for fixing these... would I just nail down the malloc to a > non-sleepable one? Only if th

/usr/src/sys/vm/uma_core.c:1332: could sleep with "kernel linker" locked from /usr/src/sys/kern/kern_linker.c:1797

2002-07-09 Thread Don Lewis
I recently started seeing the warning message: /usr/src/sys/vm/uma_core.c:1332: could sleep with "kernel linker" locked from /usr/src/sys/kern/kern_linker.c:1797 at boot time on my -current box. It appears to be related to the changes in rev 1.90 of kern_linker.c. I suspect that memory is gett

code ordering in coredump() (was: Re: cvs commit: src/sys/tools vnode_if.awk)

2002-07-09 Thread Don Lewis
I was studying the following DEBUG_VFS_LOCKS panic and noticed something bothersome about the ordering of the code in coredump(). It looked to me like it made more sense to verify that the file was something that was valid to dump to before doing the vn_start_write() stuff. Rearranging the code a

Re: What to do with witness verbiage (is this new?)?

2002-07-10 Thread Don Lewis
On 10 Jul, Alex Zepeda wrote> After the rude awakening that I was after all running current, I've > finally turned on the WITNESS related options for my kernel (and boy is it > wickedly unstable as of now). I haven't had any instability problems in a while on my UP box. > Anyways.. is there any

Re: /usr/src/sys/vm/uma_core.c:1332: could sleep with "kernel li

2002-07-10 Thread Don Lewis
On 9 Jul, John Baldwin wrote: > > On 09-Jul-2002 Don Lewis wrote: >> I recently started seeing the warning message: >> >> /usr/src/sys/vm/uma_core.c:1332: could sleep with "kernel linker" locked >> from /usr/src/sys/kern/kern_linker.c:1797 >> >

VFS lock error in getnewvnode()

2002-07-10 Thread Don Lewis
A box running this morning's -current compiled with DEBUG_VFS_LOCKS coughed up this error part way through a "cvs update" of the ports tree. VOP_GETVOBJECT: x is not locked but should be The stack trace is: getnewvnode() + 0x182 ffs_vget() + 0x73 ufs_lookup() + 0x10df vfs_vnoperate() + 0x13

Re: What to do with witness verbiage (is this new?)?

2002-07-10 Thread Don Lewis
On 10 Jul, Dan Nelson wrote: > I see this one once every 10 seconds or so: > > ../../../vm/uma_core.c:1332: could sleep with "inp" locked from >../../../netinet/tcp_subr.c:935 > ../../../vm/uma_core.c:1332: could sleep with "tcp" locked from >../../../netinet/tcp_subr.c:928 I've never seen th

Re: What to do with witness verbiage (is this new?)?

2002-07-10 Thread Don Lewis
On 10 Jul, Alex Zepeda wrote: > On Wed, Jul 10, 2002 at 02:43:54AM -0700, Don Lewis wrote: > >> I haven't had any instability problems in a while on my UP box. > > Seems like the UP kernels are more unstable for me. Go figure. > >> > ../../../vm/uma_core.c:1

Re: What to do with witness verbiage (is this new?)?

2002-07-11 Thread Don Lewis
On 10 Jul, Alex Zepeda wrote: > On Wed, Jul 10, 2002 at 01:34:50PM -0700, Don Lewis wrote: > >> > ../../../vm/uma_core.c:1332: could sleep with "inp" locked from >../../../netinet/tcp_subr.c:935 >> > ../../../vm/uma_core.c:1332: could sleep with "tcp&

Re: What to do with witness verbiage (is this new?)?

2002-07-11 Thread Don Lewis
On 11 Jul, Josef Karthauser wrote: > On Thu, Jul 11, 2002 at 11:35:46AM +0100, Josef Karthauser wrote: >> >> > I tracked it down to tcp_getcred() calling SYSCTL_OUT(), which can >> > potentially block, before releasing the locks tcp_getcred() is holding. >> > This routine is used by the net.inet.

Re: What to do with witness verbiage (is this new?)?

2002-07-11 Thread Don Lewis
On 11 Jul, Josef Karthauser wrote: > On Thu, Jul 11, 2002 at 04:01:08AM -0700, Don Lewis wrote: >> On 11 Jul, Josef Karthauser wrote: >> >> I get it whenever cron kicks of a cvsup also. >> The cvsup server may also be making ident queries. > > If it is, it is m

Re: "pipe mutex" vs. "sigio lock" lock order reversal

2002-07-11 Thread Don Lewis
On 7 Jul, Don Lewis wrote: > This error showed up in my logs this morning while I was building some > ports on a uni-processor box. I'm running a version of -current from > July 7 about 1 AM PDT. > > Jul 7 07:47:09 scratch kernel: lock order reversal > Jul 7 07:47

Re: openoffice is compiling again!...but won't run.

2002-07-11 Thread Don Lewis
On 11 Jul, walt wrote: > I just finished compiling and installing openoffice on yesterday's > -current, thanks to the stat.h patch from Bruce. It even runs properly for me if I access my previously setup home directory NFS mounted from a stable box. Also, it successfully reads a Word document th

Re: "pipe mutex" vs. "sigio lock" lock order reversal

2002-07-11 Thread Don Lewis
On 11 Jul, Don Lewis wrote: > On 7 Jul, Don Lewis wrote: >> Jul 7 07:47:09 scratch kernel: lock order reversal >> Jul 7 07:47:09 scratch kernel: 1st 0xcabf7980 pipe mutex (pipe mutex) @ >/usr/src/sys/kern/sys_pipe.c:451 >> Jul 7 07:47:09 scratch kernel: 2nd 0xc047430

Re: What to do with witness verbiage (is this new?)?

2002-07-11 Thread Don Lewis
On 11 Jul, Don Lewis wrote: > On 10 Jul, Alex Zepeda wrote: >> On Wed, Jul 10, 2002 at 01:34:50PM -0700, Don Lewis wrote: >> >>> > ../../../vm/uma_core.c:1332: could sleep with "inp" locked from >../../../netinet/tcp_subr.c:935 >>> > ../../

Re: What to do with witness verbiage (is this new?)?

2002-07-12 Thread Don Lewis
On 12 Jul, Alex Zepeda wrote: > On Wed, Jul 10, 2002 at 01:36:46PM -0700, Don Lewis wrote: > >> It'll drop into ddb every time you get a witness error and you'll have >> to tell ddb to continue. This could be a might annoying if you are >> getting errors ev

Re: Here's a new(er) one

2002-07-13 Thread Don Lewis
What was the original panic message, the one where uma_core.c prints the name of the lock being held and where it was locked? On 12 Jul, Alex Zepeda wrote: > * $FreeBSD: src/sys/kern/vfs_bio.c,v 1.319 2002/07/10 17:02:28 dillon Exp $ > * $FreeBSD: src/sys/kern/vfs_syscalls.c,v 1.267 2002/07/02 1

Re: Here's a new(er) one

2002-07-13 Thread Don Lewis
On 13 Jul, zipzippy wrote: > On Sat, Jul 13, 2002 at 07:28:43PM -0700, Don Lewis wrote: > >> What was the original panic message, the one where uma_core.c prints the >> name of the lock being held and where it was locked? > > Any way to determine this post-mortem? I wo

Re: VOP_GETATTR panic on Alpha

2002-07-16 Thread Don Lewis
On 16 Jul, Dag-Erling Smorgrav wrote: > Andrew Gallatin <[EMAIL PROTECTED]> writes: >> Just clear panicstr (w panicstr 0) when you drop into >> the debugger on a panic. > > No luck. However, I added an ASSERT_VOP_LOCKED() to vn_statfile(), > and confirmed that vn_lock() fails to lock the vnode.

unbloating {tcp,tcp6,udp,udp6}_getcred()

2002-07-29 Thread Don Lewis
The tcp_getcred(), tcp6_getcred(), udp_getcred(), udp6_getcred() look like a bad example of mostly duplicated code caused by cut and paste programming. By passing a pointer to the inpcbinfo structure as an argument to the sysctl hander it is possible to combine the use a common handler for the TC

Re: sleeping with "mntvnode" locked...

2002-08-19 Thread Don Lewis
On 19 Aug, Alex Zepeda wrote: > ../../../kern/kern_synch.c:454: sleeping with "mntvnode" locked from >../../../kern/vfs_subr.c:2789 > panic: from debugger > cpuid = 0; lapic.id = > --- > > GNU gdb 5.2.0 (FreeBSD) 20020627 > This GDB was configured as "i386-undermydesk-freebsd"...

Re: Solved: CURRENT and P-IV problems

2002-08-21 Thread Don Lewis
On 21 Aug, Martin Blapp wrote: > > Hi, > > Try to compile the entire system on another box, install it then > on the CURRENT target box, and try again ! > > Bye the way, after 6 rounds, I see now SIG4 and SIG11 too :-/ > To bad - so it's definitly data corruption in CURRENT. > > Asus Board P4B

Re: Solved: CURRENT and P-IV problems

2002-08-21 Thread Don Lewis
On 21 Aug, Don Lewis wrote: > On 21 Aug, Martin Blapp wrote: >> >> Hi, >> >> Try to compile the entire system on another box, install it then >> on the CURRENT target box, and try again ! >> >> Bye the way, after 6 rounds, I see now SIG4 and SIG

Re: Memory corruption in CURRENT

2002-08-22 Thread Don Lewis
On 22 Aug, Mark Santcroos wrote: > On Thu, Aug 22, 2002 at 09:43:45AM +0200, Martin Blapp wrote: >> Thats memory corruption. I'm also not able anymore >> to make 10 buildworlds (without -j, that triggers >> panics in pmap code). >> >> Bye the way, I'm experiencing this since about 4-5 months. >>

Re: Memory corruption in CURRENT

2002-08-22 Thread Don Lewis
On 22 Aug, Soeren Schmidt wrote: > However, this kind of problem in most cases spells bad HW to me, > ie subspec RAM, poor powersupply, badly cooled CPU, overclocking etc etc... My motherboard chipset supports ECC RAM and I have ECC RAM installed. I upgraded to an expensive Antec power supply t

Re: Memory corruption in CURRENT

2002-08-22 Thread Don Lewis
On 22 Aug, Terry Lambert wrote: > Alternatively, rather than those options, try losing 512M of > the RAM... I note they are all 1G boxes. When I first put this system together several months ago, I only installed the first 512M of RAM and the problem was much worse. I only had about a 50% chanc

Re: Page faults from bento cluster (Re: Problems reading vmcores)

2002-08-31 Thread Don Lewis
On 31 Aug, Kris Kennaway wrote: > panic: page fault > panic messages: > --- > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x4 Looks like a NULL structure pointer dereference. It looks like the access is four bytes into the structure. > #7 0xc021d91f in exec_elf3

Re: Page faults from bento cluster (Re: Problems reading vmcores)

2002-09-01 Thread Don Lewis
On 31 Aug, Kris Kennaway wrote: > Another page fault in umount I haven't seen any reports of this one before. > #6 0xc0399a48 in calltrap () at {standard input}:98 > #7 0xc029198d in vflush (mp=0xc5e6, rootrefs=0, flags=2) at vnode_if.h:309 > #8 0xc0200eaa in devfs_unmount (mp=0xc5e6,

Re: HEADS UP: GCC 3.2.1-pre imported

2002-09-03 Thread Don Lewis
On 1 Sep, Alexander Kabaev wrote: > GCC 3.2.1-pre is now in the tree. Please let me know if you see any > problems recompiling your world/kernel. I haven't seen any other reports of this problem. I'm upgrading from a September 1st version of -current. cc -O -pipe -DIN_GCC -DHAVE_CONFIG_H -DPRE

Re: Page faults from bento cluster (Re: Problems reading vmcores)

2002-09-03 Thread Don Lewis
On 31 Aug, Kris Kennaway wrote: > Another one. I have the cores if anyone needs to look at > them..otherwise I'll stop posting these for now. > > Kris > > panic: page fault > panic messages: > --- > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x4 > fault code

cvsup10 broken (was: Re: HEADS UP: GCC 3.2.1-pre imported)

2002-09-03 Thread Don Lewis
3 Sep, Don Lewis wrote: > On 1 Sep, Alexander Kabaev wrote: >> GCC 3.2.1-pre is now in the tree. Please let me know if you see any >> problems recompiling your world/kernel. > > I haven't seen any other reports of this problem. I'm upgrading from > a September 1s

Re: HEADS UP: i386 a.out binary users!

2002-09-07 Thread Don Lewis
On 7 Sep, Manfred Antar wrote: > At 12:23 AM 9/7/2002 -0700, Terry Lambert wrote: >>Peter Wemm wrote: >>> You will need to either add: >>> options COMPAT_AOUT >>> to your kernel config when you next rebuild, or do a 'kldload aout' >>> when you want to run an old a.out binary. >> >>Is this going t

vnode lock assertion problem in nfs_link()

2002-09-09 Thread Don Lewis
nfs_link() contains the following code: /* * Push all writes to the server, so that the attribute cache * doesn't get "out of sync" with the server. * XXX There should be a better way! */ VOP_FSYNC(vp, cnp->cn_cred, MNT_WAIT, cnp->cn_thread); T

Re: vnode lock assertion problem in nfs_link()

2002-09-09 Thread Don Lewis
On 9 Sep, Don Lewis wrote: > nfs_link() contains the following code: > > /* > * Push all writes to the server, so that the attribute cache > * doesn't get "out of sync" with the server. > * XXX There should be a better way! >

Re: vnode lock assertion problem in nfs_link()

2002-09-09 Thread Don Lewis
On 9 Sep, Robert Watson wrote: > On Mon, 9 Sep 2002, Don Lewis wrote: >> I think we can probably just lock and unlock vp around the call to >> VOP_FSYNC() ... > > What I'd actually like to do is lock vp on going in to the VOP. I need to > grab the lock in the lin

Re: vnode lock assertion problem in nfs_link()

2002-09-09 Thread Don Lewis
On 9 Sep, Robert Watson wrote: > What I'd actually like to do is lock vp on going in to the VOP. I need to > grab the lock in the link() code anyway to do the MAC check. UFS and > others all immediately lock the vnode on entry anyway... Here's a patch to implement this. It compiles and seems

Re: vnode lock assertion problem in nfs_link()

2002-09-10 Thread Don Lewis
On 10 Sep, Bruce Evans wrote: >> The locking changes in union_link() need a thorough review, >> though the light testing of that I performed didn't turn up any >> glaring problems. > > The changes are obviously just cleanups for leaf file systems, but I > wonder why everythin

Re: vnode lock assertion problem in nfs_link()

2002-09-10 Thread Don Lewis
On 10 Sep, Bruce Evans wrote: > On Tue, 10 Sep 2002, Don Lewis wrote: > >> On 10 Sep, Bruce Evans wrote: >> > The changes are obviously just cleanups for leaf file systems, but I >> > wonder why everything wasn't always locked at the top. Could it have &g

Re: vnode lock assertion problem in nfs_link()

2002-09-10 Thread Don Lewis
On 10 Sep, Don Lewis wrote: > On 10 Sep, Bruce Evans wrote: >> On Tue, 10 Sep 2002, Don Lewis wrote: >> >>> On 10 Sep, Bruce Evans wrote: > >>> > The changes are obviously just cleanups for leaf file systems, but I >>> > wonder why everyth

Re: vnode lock assertion problem in nfs_link()

2002-09-10 Thread Don Lewis
On 10 Sep, Terry Lambert wrote: > Bruce Evans wrote: >> The changes are obviously just cleanups for leaf file systems, but I >> wonder why everything wasn't always locked at the top. Could it have >> been because locking all the way down is harmful? > > For a stacked local media FS, you can end

Re: vnode lock assertion problem in nfs_link()

2002-09-10 Thread Don Lewis
On 11 Sep, Bruce Evans wrote: > On Tue, 10 Sep 2002, Don Lewis wrote: > > I have just one thing to add to Robert's reply. > >> BTW, is it safe to call ASSERT_VOP_UNLOCKED() in the SMP case after the >> reference has been dropped with vput() or vrele()? > > I

Re: vnode lock assertion problem in nfs_link()

2002-09-10 Thread Don Lewis
On 10 Sep, Robert Watson wrote: > On Tue, 10 Sep 2002, Don Lewis wrote: >> I'm mostly worried about the vnode being recycled as something else >> after the vput() or vrele() call. I think a better approach would be to >> add the assertion checks to vput() and vrele()

Re: vnode lock assertion problem in nfs_link()

2002-09-10 Thread Don Lewis
On 9 Sep, Don Lewis wrote: > On 9 Sep, Robert Watson wrote: > >> What I'd actually like to do is lock vp on going in to the VOP. I need to >> grab the lock in the link() code anyway to do the MAC check. UFS and >> others all immediately lock the vnode on entry

vnode lock assertion problem in nfs_rename()

2002-09-10 Thread Don Lewis
On 9 Sep, Don Lewis wrote: > nfs_link() contains the following code: > > /* > * Push all writes to the server, so that the attribute cache > * doesn't get "out of sync" with the server. > * XXX There should be a better way! >

Re: Locking problems in exec

2002-09-10 Thread Don Lewis
On 7 Sep, Garrett Wollman wrote: > I just noted the following: > > ../../../vm/uma_core.c:1332: could sleep with "process lock" locked from >../../../kern/kern_exec.c:368 > lock order reversal > 1st 0xc438e6a8 process lock (process lock) @ ../../../kern/kern_exec.c:368 > 2nd 0xc0413d20 fileli

Re: Locking problems in exec

2002-09-10 Thread Don Lewis
On 10 Sep, Nate Lawson wrote: > I'm not sure why fdcheckstd() and setugidsafety() couldn't both happen > before grabbing the proc lock. Dropping locks in the middle or > pre-allocating should always be a last resort. That is ok as long as there aren't other threads that can mess things up after

Re: Locking problems in exec

2002-09-10 Thread Don Lewis
On 10 Sep, Don Lewis wrote: > On 10 Sep, Nate Lawson wrote: > >> I'm not sure why fdcheckstd() and setugidsafety() couldn't both happen >> before grabbing the proc lock. Dropping locks in the middle or >> pre-allocating should always be a last resort. >

nfs_inactive() bug? -> panic: lockmgr: locking against myself

2002-09-10 Thread Don Lewis
I've gotten a couple "lockmgr: locking against myself" panics today and it seems to be somewhat reproduceable. I had the serial console hooked up for the last one, so I was able to get a stack trace: panic: lockmgr: locking against myself Debugger("panic") Stopped at Debugger+0x45: xchgl

Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself

2002-09-10 Thread Don Lewis
On 10 Sep, Don Lewis wrote: > It looks like the call to vrele() from vn_close() is executing the > following code: > > if (vp->v_usecount == 1) { > vp->v_usecount--; > /* > * We must call VOP_IN

Re: Locking problems in exec

2002-09-11 Thread Don Lewis
On 11 Sep, John Baldwin wrote: > > On 11-Sep-2002 Don Lewis wrote: >> On 10 Sep, Don Lewis wrote: >>> On 10 Sep, Nate Lawson wrote: >>> >>>> I'm not sure why fdcheckstd() and setugidsafety() couldn't both happen >>>> before g

Re: Softupdate panic: softdep_update_inodeblock: update failed

2002-09-12 Thread Don Lewis
On 12 Sep, Martin Blapp wrote: > > Hi, > >> If I were you I'd start swapping memory modules, because I'm not having > > Already did that. I even used ECC ram. > >> any trouble with -CURRENT and I havn't seen anyone else having trouble. > > Did you try to build a huge project ? If I don't comp

Re: Softupdate panic: softdep_update_inodeblock: update failed

2002-09-12 Thread Don Lewis
On 12 Sep, Martin Blapp wrote: > Just a thought ... What type of disks are you using? I'm running SCSI >> here. > > ATA ... But I should see disk errors then ... > > I've bought now new disks and will try to build on them. It's not that I think your hardware is defective. I'm wondering if dif

Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself

2002-09-12 Thread Don Lewis
On 11 Sep, Ian Dowse wrote: > In message <[EMAIL PROTECTED]>, Don Lewis writes: >> >>A potentially better solution just occurred to me. It looks like it >>would be better if vrele() waited to decrement v_usecount until *after* >>the call to VOP_INACTIVE() (and

Re: panic: buffer not busy ??? (pagefault)

2002-09-12 Thread Don Lewis
On 13 Sep, Martin Blapp wrote: > > I've got some news here ... > > I have three different type of disks > > 1) - ATA100 Raid, 2 Disks a 80GB (striped) > 2) - Vinum ATA Raid, 2 Disks a 16GB (striped) > 3) - SCSI Disk > > I encounter pagefaults and all these nice panics on 1) and 2). > > I don'

Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself

2002-09-12 Thread Don Lewis
On 13 Sep, Ian Dowse wrote: > For example, if you hold the reference count at 1 while calling the > cleanup function, it allows that function to safely add and drop > references, but if that cleanup function has a bug that drops one > too many references then you end up recursing instead of detec

Re: filesystem corruption ?

2002-09-17 Thread Don Lewis
On 17 Sep, Michael Reifenberger wrote: > On Tue, 17 Sep 2002, Martin Blapp wrote: > >> Date: Tue, 17 Sep 2002 02:29:41 +0200 (CEST) >> From: Martin Blapp <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: Michael Reifenberger <[EMAIL PROTECTED]>, Peter Wemm <[EMAIL PROTECTED]>, >> [EMAIL PR

Re: Crashdumps available for download ... please help

2002-09-18 Thread Don Lewis
On 18 Sep, Martin Blapp wrote: > > Hi Robert, > >> Chances are, if you change an important variable such as memory size, it >> will change the failure mode for this bug. Carefully marking the memory > > Sigh. > > Looks like you are right. After running the system now for 8 hours, > I got exac

Re: Crashdumps available for download ... please help

2002-09-18 Thread Don Lewis
On 18 Sep, Garrett Wollman wrote: > < said: > >> 10. Upgraded to gcc3.2. I was seeing now some SIG11 during builds, >> and - panics ! Softupdates and fs panics mostly. I turned off >> softupdates. The panic was different, but all the time it was >> in mmap. > > I'm not seeing panics,

booting with a serial console disables the screen saver

2002-09-18 Thread Don Lewis
I needed to configure my -current box to use a serial console so that I could get to ddb while the system was running the X11 server on the main console, so I hooked up the serial cable and rebooted with the -h option in boot.config. After I did this, I noticed that the green_saver kld would no l

Re: booting with a serial console disables the screen saver

2002-09-18 Thread Don Lewis
On 18 Sep, Doug White wrote: > On Wed, 18 Sep 2002, Don Lewis wrote: > >> I needed to configure my -current box to use a serial console so that I >> could get to ddb while the system was running the X11 server on the main >> console, so I hooked up the serial cable

Re: kernel crash at boot time

2002-09-19 Thread Don Lewis
On 19 Sep, David Xu wrote: > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x10 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc0227c89 > stack pointer = 0x10:0xcd3029c4 > frame pointer = 0x10:0xcd302

Re: cvs commit: src/sys/sys lockmgr.h

2002-09-25 Thread Don Lewis
I tried booting a kernel with lock checking enabled and got the following panic: panic: mutex vnode interlock owned at vnode_if.h:24 panic() _mtx_assert() VOP_ISLOCKED() vop_unlock_pre() vput() kern_mkdir()+0x9e - the first call to vput() to handle the "found" case? start_init() fork_exit() fork

memory/filesystem corruption, a cautionary tale (was: Re: Crashdumps available for download (solved I think))

2002-09-27 Thread Don Lewis
On 19 Sep, Martin Blapp wrote: > > Hi all, > > With help of http://www.memtest86.com/memtest86-3.0.iso I've tracked > it down to three 3 ! bad DRAMS. Thanks for the pointer. I have continued to see transient filesystem damage that would disappear with a reboot, which made me suspect that the f

Re: memory/filesystem corruption, a cautionary tale (was: Re: Crashdumps available for download (solved I think))

2002-09-27 Thread Don Lewis
On 27 Sep, walt wrote: > Don Lewis wrote: > >> It looks like either my motherboard BIOS is incorrectly sensing the RAM >> speed, or it it senses the RAM speed correctly and is incorrectly >> configuring the RAM timing... > > Is there a BIOS upgrade available for t

Re: World broken at libkvm

2002-09-30 Thread Don Lewis
On 30 Sep, Peter Wemm wrote: > Juli Mallett wrote: >> And now fixed. All we have to look out for now is someone doing something >> that exposes some sort of functional difference, but I don't anticipate it. > I suggest you turn WITNESS on, and stress the system. If you get *new* > 'could sleep

Re: World broken at libkvm

2002-10-01 Thread Don Lewis
On 30 Sep, Juli Mallett wrote: > * De: Don Lewis <[EMAIL PROTECTED]> [ Data: 2002-09-30 ] >> I suggest looking especially closely at the sigio stuff. Even the old >> code has a lock order reversal problem when I/O to a pipe wants to >> signal the process at the other

Re: World broken at libkvm

2002-10-01 Thread Don Lewis
On 30 Sep, Juli Mallett wrote: > No locks except for the lock of the process being signalled should be > held when sending signals, IMHO, though I am mostly ignorant of the SIGIO > locking. BTW, kill() wants to hold the allproc_lock or a process group lock while iterating over the list of proces

Re: system lockup: nfs server not responding 10 > 9 (tx driver bug?)

2002-10-01 Thread Don Lewis
On 1 Oct, Sam Leffler wrote: > I have repeated problems where a machine running -current locks up while > running make over an NFS mounted filesystem. The NFS server is an up to > date -stable machine. When the lockup occurs I get a message: > > nfs server : not responding 10 > 9 > > The file

vnode lock assertion failure in nfs_doio()

2002-10-02 Thread Don Lewis
Version 1.114 of nfs_bio.c added a call to ASSERT_VOP_LOCKED() to nfs_doio(). I've been running a kernel with the DEBUG_VFS_LOCKS option and I can consistently get this assertion to fail by running mozilla with an nfs mounted home directory. The DDB stack trace indicates this assertion fails whe

Re: Junior Kernel Hacker page updated...

2002-10-02 Thread Don Lewis
On 2 Oct, Stefan Farfeleder wrote: > /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" >locked from /freebsd/current/src/sys/kern/kern_event.c:959 > > at me and freezes badly at some point (no breaking into ddb possible). > This is totally repeatable. Is anybo

Re: NFS hang on rmdir(2) with 5.0-current client, server

2002-10-02 Thread Don Lewis
On 2 Oct, Robert Watson wrote: > > Running into an odd (and apparently recent) problem involving rmdir(2) and > NFS. I have a diskless box started using pxeboot: NFS /, MFS /var, MFS > /tmp, recent 5.0-CURRENT. Attempt to rmdir /usr/local (on NFS) results in > NFS hanging. It appears to send

Re: Junior Kernel Hacker page updated...

2002-10-02 Thread Don Lewis
On 2 Oct, Don Lewis wrote: > On 2 Oct, Stefan Farfeleder wrote: > >> /freebsd/current/src/sys/vm/uma_core.c:1307: could sleep with "filedesc structure" >locked from /freebsd/current/src/sys/kern/kern_event.c:959 >> >> at me and freezes badly at some

Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-08 Thread Don Lewis
On 8 Oct, Stefan Farfeleder wrote: > On Mon, Oct 07, 2002 at 03:48:45AM -0700, Terry Lambert wrote: > Following the advice from the spl* man page I turned the spl* calls to a > mutex and was surprised to see it working. My SMP -current survived a 'make > -j16 buildworld' with make using kqueue()

Re: src/games bikeshed time.

2002-10-09 Thread Don Lewis
On 9 Oct, Stephen J. Roznowski wrote: > On 9 Oct, Mark Murray wrote: >>> I've had a patch in the system (bin/12727) since 1999/07/20 that does >>> just this for the NetBSD patches. I've tried a few times to get it >>> committed. See the patch for details... >> >> This is good to have, but it do

Re: [PATCH] Re: Junior Kernel Hacker page updated...

2002-10-09 Thread Don Lewis
On 10 Oct, Stefan Farfeleder wrote: > On Tue, Oct 08, 2002 at 09:26:29PM -0700, Don Lewis wrote: >> On 8 Oct, Stefan Farfeleder wrote: >> > However, WITNESS complains (only once) about this: >> > lock order reversal >> > 1st 0xc662140c kqueue mutex (kqueue

Re: HEADSUP: GCC 3.2.1 update is coming

2002-10-10 Thread Don Lewis
On 10 Oct, Steve Kargl wrote: > On Thu, Oct 10, 2002 at 07:10:44PM -0400, Andrew Gallatin wrote: >> >> This is a terrible example in that its impossible to tell what >> happened, but it sure shows that something is wrong. >> >> Is there a floating point regression suite that you can point me at?

Problem booting -current via /boot/boot1 from GRUB

2002-10-14 Thread Don Lewis
My -current box is set up to use GRUB to allow me to either run FreeBSD or Linux. A while back, I added a third option to allow me to boot FreeBSD with a serial console and just happened to make this the default. Once I had this working, I turned just turned the serial console mode on and off by

Re: yet another lock order reversal

2002-10-21 Thread Don Lewis
On 21 Oct, Lars Eggert wrote: > lock order reversal > 1st 0xc791bc00 pipe mutex (pipe mutex) @ /usr/src/sys/kern/sys_pipe.c:465 > 2nd 0xc04974e0 sigio lock (sigio lock) @ /usr/src/sys/kern/kern_sig.c:2156 I've been complaining about that one for ages. I think I know how I want to attack it, b

Re: yet another lock order reversal

2002-10-21 Thread Don Lewis
On 21 Oct, Robert Watson wrote: > > On Mon, 21 Oct 2002, Lars Eggert wrote: > >> lock order reversal >> 1st 0xc791bc00 pipe mutex (pipe mutex) @ /usr/src/sys/kern/sys_pipe.c:465 >> 2nd 0xc04974e0 sigio lock (sigio lock) @ /usr/src/sys/kern/kern_sig.c:2156 > > It strikes me that, for better o

<    1   2   3   4   5   >