I just got through diagnosing some problems Mohan Kokal has been
having with not being able to specify a large enough NMBCLUSTERS
on large-memory (2G) machine. The symptoms were that he was able
to specify 65536 clusters on a 1G machine, but the same parameter
panic'd a 2G machine on boot.
Mohan graciously gave me a login on the system so I could gdb a
live kernel (with 61000 clusters, which worked) to figure out what
was going on. This is what I found:
SYSTEM CONFIG: 2G physical ram, 61000 NMBCLUSTERS, 512 maxusers.
kernel_map (1G) bfeff000 - ff800000
kmem_map 397MB c347a000 - daf30000 (mb_map is here - 187MB)
clean_map 267MB db474000 - eb350000 (buffer_map is here)
sf_buf's 35MB
zone allocator 299MB (breakdown: 75MB for PVENTRY, 164MB for
SWAPMETA)
-----
998 MB oops!
In otherwords, he actually ran out of KVM!
The problem we face is that KVM does not scale with real memory. So on
a 1G machine the various maps are smaller, allowing more mbufs to
be specified in the kernel config. On a 2G machine the various maps
are larger, allowing fewer mbufs to be specified. On a 4G machine it
is even worse.
There are two things I would like to commit for the release:
- I would like to cap the SWAPMETA zone reservation to 70MB,
which allows us to manage a maximum of 29GB worth of swapped
out data.
This is plenty and saves us 94MB of KVM which is roughly
equivalent to 30,000 nmbclusters/mbufs.
- I would like to cap the size of the buffer cache at 200MB,
giving us another 70MB or so of KVM which is equivalent to
another 30,000 or so nmbclusters.
I would have kernel options to override the caps. The already existing
NBUF option would override the buffer cache cap, and I would add
a kernel option called SWAPMAX which would override the swapmeta
cap.
These changes will allow large-memory machines to scale KVM a bit
better and reduce unexpected panic-at-boot problems. Swap performance
will not be effected at all because my original SWAPMETA calculation
was overkill. The buffer cache will be sized as if the machine had
about 1.5GB of ram and so the change only caps it when physmem is
larger then that, and should have a minimal impact since the meat of
our caching is the VM page cache, not the buffer cache.
If the release engineer(s) give the OK, I will stage these changes
into -current this weekend and -stable on monday or tuesday.
-Matt
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message