: We've been here before, a couple of times. This started to become an issue
:when the limits were removed and has gotten worse as the vnode and fsnode
:structs have grown over time. We're running into some limits on how much
:space we can give to the kernel since there are a number of folks which
:think that 3GB of process VA space is a minimum. I tend to think that the
:2GB/2GB split that I use on wcarchive is probably more appropriate as a
:default, but like I say, others disagree.
:
:-DG
:
:David Greenman
If we added the capability to the buffer cache to delete B_DELWRI B_VMIO
buffers (leaving dirty VM pages behind), we could reduce the size of
the filesystem buffer cache considerably while at the same time improve
our ability to cache dirty data - assuming all the other problems related
to doing that sort of thing get fixed, that is. I am heading this way
already as are others -- the filesystem buffer cache really needs to be
relegated to handling active I/O and filesystem mappings, not holding
onto dirty data for dear life.
This would require keeping track of most dirty pages, which isn't too
hard to do - we split the vm_object page list into a clean and a dirty
list, and we keep the notion of clean and dirty vnodes so the update
daemon doesn't change.
If we can reduce the size of the filesystem buffer cache to something
reasonable, more KVA space will be available for other kernel things.
--
The biggest stumbling block to doing this is the reconstitution overhead
of the buffer cache, as demonstrated by this simple test. As you can
see by this test, the cost of reconstituting a filesystem buffer on a
pentium-Pro 200 is roughly equivalent to 27 MBytes/sec worth of
bandwidth.
Create a big file:
dd if=/dev/zero of=test bs=32k count=4096
DD back in (several times) a block big enough to fit in the VM page cache
but not big enough to fit into the filesystem buffer cache. No actual
disk I/O occurs:
dd if=test of=/dev/null bs=32k count=256
8388608 bytes transferred in 0.146539 secs (57244848 bytes/sec)
DD back in (several times) a block big enough to fit in the VM page cache
*AND* the filesystem buffer cache. No actual disk I/O occurs:
apollo:/usr/obj# dd if=test of=/dev/null bs=32k count=64
2097152 bytes transferred in 0.024780 secs (84630712 bytes/sec)
-Matt
Matthew Dillon
<[EMAIL PROTECTED]>
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message