:
: I also don't think "sync" is a fix either. I expect "sync" to reclaim
:unused space. For instance, the file system currently shows 9 GB in use
:with "df", but there is only about 5 GB actually present on the disk. I
:ran "sync", and I expected "df" to report about 5GB used, but it doesn't
:seem to change anything. I'm going to try sync again tommorrow once the
:unreclaimed space is about 30GB or so, and see if it does anything.
Try lots of sync's ... like one a second :-). One sync won't do it.
But what we really want to do is make the thing crash and hopefully
(with the serial console maybe) get a panic message.
Conventionally what should be occuring is that the kernel should be
running out of some memory pool. If this is what is occuring it should
generate a panic message prior to rebooting.
A couple of other things you can do:
Compile up the kernel with DDB configured so the system drops into
DDB instead of panicing (only do this if you have access to
the console). Then you should be able to 'trace' and 'ps' prior
to typing 'panic' <return> manually (type as many <return>s as
necessary after that but be careful, you don't want to interrupt
a kernel dump if the kernel has started one!).
Using several local xterms with a large back buffer configured,
ssh to the machine under test and setup a couple of csh while(1)
loops to look at various kernel resources, e.g.
while (1)
vmstat -z; vmstat -m
end
end
The reason you use a local xterm in which you ssh to the remote
machine is so the xterm doesn't disappear on you when the remote
machine crashes :-).
A tail -f /var/log/messages will probably *NOT* spit out the panic
message quickly enough, but a true serial console (not just a getty
running on the port) should spit it out just fine.
: One thing that is interesting is that the following sysctl variables are
:always zero:
:
:debug.blk_limit_push: 0
:debug.ino_limit_push: 0
:debug.blk_limit_hit: 0
:debug.ino_limit_hit: 0
:debug.rush_requests: 0
:
: So it doesn't look like softupdates is rushing things out.
These aren't very useful unless you only have a tiny bit of main
memory. for all practical purposes the limit is not usually ever
reached (which is probably why its buggy when it *is* reached).
: "vmstat -m" is showing that the storage for "inodedep" is steadily
:increasing.
:
: I _think_ I need to increase tick_delay, so when the max_softdeps limit
:is finally hit, syncer gets run for a while and clean things up.
tick_delay will probably not have much of an effect.
look at the vmstat -m output carefully as you run the test (as suggested
above). Bad things happen if you run the kernel out of KVM, and that
can happen even if you have plenty of normal ram. There are *TWO* limits
involved. There is the limit for the memory pool you are observing,
and there is a global limit on the grand total which is nominally
2x the per-pool limit. If either limit is reached the machine is hosed.
:Tom
:Uniserve
-Matt
Matthew Dillon
<[EMAIL PROTECTED]>
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message