Lock Order Reversal in nmount/unmount of devfs on NFS

2010-10-22 Thread Linda Messerschmidt
When mounting a devfs filesystem on top of a directory in an NFS filesystem under 8.1-RELEASE-p1 r213075M, the following lock order reversal is reported: lock order reversal: 1st 0xc264bbdc nfs (nfs) @ /usr/src/sys/kern/vfs_mount.c:1058 2nd 0xc264bce8 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c

Re: Lock Order Reversal in nmount/unmount of devfs on NFS

2010-10-22 Thread Linda Messerschmidt
On Fri, Oct 22, 2010 at 2:47 PM, Kostik Belousov wrote: > The LORs are believed to be harmless. OK, I won't worry about it. I did check the list on sources.zabbadoz.net and didn't see any for nfs & devfs or nfs & syncer, so I just wanted to be sure. :) Thanks! __

Where have all the vnodes gone?

2009-08-01 Thread Linda Messerschmidt
(Reposted from freebsd-questions due to no replies.) With the last few releases, I've noticed a distinct trend toward disappearing vnodes on one of the machines I look after. This machine isn't doing a whole lot. It runs a couple of small web sites, and once an hour it rsync's some files from on

Re: Where have all the vnodes gone?

2009-08-03 Thread Linda Messerschmidt
Sorry, I did not mean to reply off-list. (I had asked if the kernel options suggested were appropriate for a production system.) On Sat, Aug 1, 2009 at 9:34 PM, Attilio Rao wrote: > Penalties in term of overhead are pretty huge. That would probably mean not putting it on that box. Don't get me

Intermittent system hangs on 7.2-RELEASE-p1

2009-08-26 Thread Linda Messerschmidt
I'm trying to troubleshoot an intermittent Apache performance problem, and I've narrowed it down using to what appears to be a brief whole-system hang that last from 0.5 - 3 seconds. They occur every few minutes. I took the rather extreme step of doing "ktrace -t cnisuwt -i -d -p 1" and then I wa

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-08-27 Thread Linda Messerschmidt
On Wed, Aug 26, 2009 at 4:42 PM, John Baldwin wrote: > One thing to note is that ktrace only logs voluntary context switches (i.e. > call to tsleep or waiting on a condition variable). It specifically does not > log preemptions or blocking on a mutex, I was not aware, thanks. > so in theory if y

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-10 Thread Linda Messerschmidt
On Thu, Aug 27, 2009 at 5:29 PM, John Baldwin wrote: > Ah, cool, what you want to do is use KTR with KTR_SCHED and then use > schedgraph.py (src/tools/sched) to get a visual picture of what the box does > during a hang.  The timestamps in KTR are TSC cycle counts rather than an > actual wall time w

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-10 Thread Linda Messerschmidt
On Thu, Sep 10, 2009 at 12:57 PM, Ryan Stone wrote: > You should be able to run schedgraph.py on a windows machine with python > installed.  It works just fine for me on XP. Don't have any of those either, but I *did* get it working on a Mac right out of the box. Should have thought of that soone

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-10 Thread Linda Messerschmidt
On Thu, Sep 10, 2009 at 2:46 PM, Julian Elischer wrote: > I've noticed that schedgraph tends to show the idle threads slightly > skewed one way or the other.  I think there is a cumulative rounding > error in the way they are drawn due to the fact that they are run so > often.  Check the raw data a

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-10 Thread Linda Messerschmidt
Just to follow up, I've been doing some testing with masking for KTR_LOCK rather than KTR_SCHED. I'm having trouble with this because I have the KTR buffer size set to 1048576 entries, and with only KTR_LOCK enabled, this isn't enough for even a full second of tracing; the sample I'm working with

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-11 Thread Linda Messerschmidt
On Fri, Sep 11, 2009 at 11:02 AM, John Baldwin wrote: > Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in > mtx_init() if MTX_SPIN is set) I have *no* idea what you just said. :) Which is fine. But more to the point, I have no idea how to do it. :) > A more recently schedgra

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-11 Thread Linda Messerschmidt
On Fri, Sep 11, 2009 at 3:06 PM, John Baldwin wrote: > Something like this: Ah, I understand now. :) Got up to 17 seconds of trace with that change. > Hmm.  It works well for me for doing traces. It definitely works, it just always seems to have some-or-another weird artifact. But, with the l

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-11 Thread Linda Messerschmidt
OK, I have learned that ktrdump looks up the name of the process associated with a particular KSE at the the time of the dump, so if it's changed since tracing stopped, it will blissfully blame the wrong process. I understand why that's the case, but it still sucks for troubleshooting. :( This ti

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-11 Thread Linda Messerschmidt
On Sat, Sep 12, 2009 at 12:06 AM, Julian Elischer wrote: > does the system have a serial console? how about a normal console /keyboard? It has an IP KVM. > how often deos it hang? and for  how long? Well, this is interesting. I got really frustrated with the other approach, so I thought I'd th

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-11 Thread Linda Messerschmidt
On Sat, Sep 12, 2009 at 1:47 AM, Julian Elischer wrote: > ok now we need to describe the hang..  if you can predictably get a hang > every 7 seconds does this mean that it doesn't respond to keyboard for a > moment every 7 seconds? It's possible. > or that it doesn't accept packets every 7 secon

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-11 Thread Linda Messerschmidt
On Sat, Sep 12, 2009 at 2:52 AM, Linda Messerschmidt wrote: > On Sat, Sep 12, 2009 at 1:47 AM, Julian Elischer wrote: >> ok now we need to describe the hang..  if you can predictably get a hang >> every 7 seconds does this mean that it doesn't respond to keyboard for a >&

Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-09-12 Thread Linda Messerschmidt
value (128) in combination with only allowing 1 access at a time. 128 requests * (50ms sleep + 2ms request + overhead) ~= 7s. So that was just noise masking the real problem, which is less frequent and less predictable. Sorry for the red herring. :( On Sat, Sep 12, 2009 at 2:52 AM, Linda Messersc

Re: ZFS group ownership

2009-09-16 Thread Linda Messerschmidt
On Wed, Sep 16, 2009 at 9:00 AM, Christoph Hellwig wrote: > Btw, on Linux all the common filesystem support the SysV behaviour > by default but have a mount option bsdgroups/grpid that turns on the BSD > hebaviour.  I would recommend you do the same just with reversed signs > on FreeBSD.  ??Having

FreeBSD 7.2 + NFS + nullfs + unlink + fstat = Stale NFS File Handle

2009-10-27 Thread Linda Messerschmidt
We have encountered a problem with a weird behavior when NFS and nullfs are combined and a program creates, unlinks, and then fstats a file in the resulting directory. After encountering this problem in the wild, I wrote a quick little C program. It creates a file, unlinks the file, and then fsta

Re: FreeBSD 7.2 + NFS + nullfs + unlink + fstat = Stale NFS File Handle

2009-10-28 Thread Linda Messerschmidt
On Wed, Oct 28, 2009 at 8:48 AM, Andrey Simonenko wrote: > As I understand when a file is opened in NULLFS its vnode gets new > reference on 'count of users', but this new reference is not propagated > to the lower vnode (vnode that is under NULLFS).  When a file is removed > NULLFS passes this op

Superpages on amd64 FreeBSD 7.2-STABLE

2009-11-26 Thread Linda Messerschmidt
We have a squid proxy process with very large memory requirements (10 - 20 GB) on a machine with 24GB of RAM. Unfortunately, we have to rotate the logs of this process once per day. When we do, it fork()s and exec()s about 16-20 child processes as helpers. Since it's got this multi-million-entry

Re: Superpages on amd64 FreeBSD 7.2-STABLE

2009-11-26 Thread Linda Messerschmidt
On Thu, Nov 26, 2009 at 10:34 AM, Ryan Stone wrote: > Is squid multithreaded? No, it isn't: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 75086 squid 1 40 12571M 12584M kqread 6 31:31 0.68% squid Thanks! ___

Re: Superpages on amd64 FreeBSD 7.2-STABLE

2009-11-26 Thread Linda Messerschmidt
I think I was not clear with my message, I apologize. I did not mean to suggest that we were asking for help solving a problem with squid rotation. I provided that information as background to discuss what we observed as a potential misbehavior in the new VM superpages feature, in the hope that i

Re: UNIX domain sockets on nullfs still broken?

2009-12-01 Thread Linda Messerschmidt
On Mon, Nov 30, 2009 at 10:14 AM, Ivan Voras wrote: >> What's the sane solution, then, when the only method of communication >> is unix domain sockets? > > It is a security problem. I think the long-term solution would be to add a > sysctl analogous to security.jail.param.securelevel to handle thi

Re: Superpages on amd64 FreeBSD 7.2-STABLE

2009-12-10 Thread Linda Messerschmidt
On Wed, Dec 9, 2009 at 9:07 AM, John Baldwin wrote: > There is lower hanging fruit in other areas > in the VM that will probably be worked on first. OK, as long as somebody who knows more than me knows whats going on, that's good enough for me. :) Thanks!

Re: Superpages on amd64 FreeBSD 7.2-STABLE

2009-12-10 Thread Linda Messerschmidt
On Thu, Dec 10, 2009 at 9:50 AM, Bernd Walter wrote: > I obviously don't have enough clue about this to understand those details. > Hope that someone can enlighten me. I think what he is saying is that they are aware that the current situation is not ideal. vfork() is suggested as a workaround,

Re: Superpages on amd64 FreeBSD 7.2-STABLE

2009-12-10 Thread Linda Messerschmidt
Also... On Thu, Dec 10, 2009 at 9:50 AM, Bernd Walter wrote: > I use fork myself, because it is easier sometimes, but people writing > big programms such as squid should know better. > If squid doesn't use vfork they likely have a reason. Actually they are probably going to switch to vfork(). T

8.0-RELEASE-p1 Panic "panic: sbdrop"

2009-12-15 Thread Linda Messerschmidt
This is a new one on me: panic: sbdrop cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x182 sbdrop_internal() at sbdrop_internal+0x323 soisdisconnected() at soisdisconnected+0xbe tcp_close() at tcp_close+0x45 tcp_do_segment() at tcp_do_segmen

Re: 8.0-RELEASE-p1 Panic "panic: sbdrop"

2009-12-17 Thread Linda Messerschmidt
On Wed, Dec 16, 2009 at 6:52 AM, Robert Watson wrote: > Could you tell us a bit more about the network configuration -- especially, > are you using any tunneling software (such as ipsec), netgraph, or other > less commonly used network features?  Are you using accept filters? Let's see, we are us

ps "time" field jumps backward

2010-02-05 Thread Linda Messerschmidt
Hi all, For most of 7.2, on up to a 7.3-PRERELEASE built yesterday, I've noticed that the "time" field reported by ps and top jumps around for some processes. I've particularly noticed it with MySQL. Here are some repeated ps results (ps axo pid,time,wchan,comm) for the same process over a few m

Re: ps "time" field jumps backward

2010-02-05 Thread Linda Messerschmidt
On Fri, Feb 5, 2010 at 4:28 PM, Dan Nelson wrote: > Ideally, top and ps would total up all > the per-thread CPU counts when displaying the per-process numbers, but it > doesn't seem to. It does seem to total them: $ ps axHo pid,lwp,time,wchan,comm | awk '$1 == 1647' 1647 100401 0:00.63 select