Re: disk I/O, VFS hirunningspace

Alan Cox Fri, 16 Jul 2010 16:36:13 -0700

Peter Jeremy wrote:

Regarding vfs.lorunningspace and vfs.hirunningspace...


On 2010-Jul-15 13:52:43 -0500, Alan Cox<alan.l....@gmail.com>  wrote:

Keep in mind that we still run on some fairly small systems with limited I/O
capabilities, e.g., a typical arm platform.  More generally, with the range
of systems that FreeBSD runs on today, any particular choice of constants is
going to perform poorly for someone.  If nothing else, making these sysctls
a function of the buffer cache size is probably better than any particular
constants.


That sounds reasonable but brings up a related issue - the buffer
cache.  Given the unified VM system no longer needs a traditional Unix
buffer cache, what is the buffer cache still used for?


Today, it is essentially a mapping cache.  So, what does that mean?

After you've set aside a modest amount of physical memory for the kernelto hold its own internal data structures, all of the remaining physicalmemory can potentially be used to cache file data. However, on manyarchitectures this is far more memory than the kernel caninstantaneously access. Consider i386. You might have 4+ GB ofphysical memory, but the kernel address space is (by default) only 1GB. So, at any instant in time, only a fraction of the physical memoryis instantaneously accessible to the kernel. In general, to access anarbitrary physical page, the kernel is going to have to replace anexisting virtual-to-physical mapping in its address space with one forthe desired page. (Generally speaking, on most architectures, even thekernel can't directly access physical memory that isn't mapped by avirtual address.)

The buffer cache is essentially a region of the kernel address spacethat is dedicated to mappings to physical pages containing cached filedata. As applications access files, the kernel dynamically maps (andunmaps) physical pages containing cached file data into this region.Once the desired pages are mapped, then read(2) and write(2) canessentially "bcopy" from the buffer cache mapping to the application'sbuffer. (Understand that this buffer cache mapping is a prerequisitefor the copy out to occur.)

So, why did I call it a mapping cache? There is generally locality inthe access to file data. So, rather than map and unmap the desiredphysical pages on every read and write, the mappings to file data areallowed to persist and are managed much like many other kinds ofcaches. When the kernel needs to map a new set of file pages, it findsan older, not-so-recently used mapping and destroys it, allowing thosekernel virtual addresses to be remapped to the new pages.

So far, I've used i386 as a motivating example. What of otherarchitectures? Most 64-bit machines take advantage of their largeaddress space by implementing some form of "direct map" that providesinstantaneous access to all of physical memory. (Again, I use"instantaneous" to mean that the kernel doesn't have to dynamicallycreate a virtual-to-physical mapping before being able to access thedata.) On these machines, you could, in principle, use the direct mapto implement the "bcopy" to the application's buffer. So, what is thepoint of the buffer cache on these machines?

A trivial benefit is that the file pages are mapped contiguously in thebuffer cache. Even though the underlying physical pages may bescattered throughout the physical address space, they are mappedcontiguously. So, the "bcopy" doesn't need to worry about every pageboundary, only buffer boundaries.

The buffer cache also plays a role in the page replacement mechanism.Once mapped into the buffer cache, a page is "wired", that is, itremoved from the paging lists, where the page daemon could reclaim it.However, a page in the buffer cache should really be thought of as being"active". In fact, when a page is unmapped from the buffer cache, it isplaced at the tail of the virtual memory system's "inactive" list. Thesame place where the virtual memory system would place a physical pagethat it is transitioning from "active" to "inactive". If an applicationlater performs a read(2) from or write(2) to the same page, that pagewill be removed from the "inactive" list and mapped back into the buffercache. So, the mapping and unmapping process contributes to creating anLRU-ordered "inactive" queue.

Finally, the buffer cache limits the amount of dirty file system datathat is cached in memory.

...  Is the current
tuning formula still reasonable (for virtually all current systems
it's basically 10MB + 10% RAM)?

It's probably still good enough. However, this is not a statement forwhich I have supporting data. So, I reserve the right to change myopinion. :-)

Consider what the buffer cache now does. It's just a mapping cache.Increasing the buffer cache size doesn't affect (much) the amount ofphysical memory available for caching file data. So, unlike ancienttimes, increasing the size of the buffer cache isn't going to havenearly the same effect on the amount of actual I/O that your machinedoes. For some workloads, increasing the buffer cache size may havegreater impact on CPU overhead than I/O overhead. For example, all ofyour file data might fit into physical memory, but you're doing randomread accesses to it. That would cause the buffer cache to thrash, eventhough you wouldn't do any actual I/O. Unfortunately, mapping pagesinto the buffer cache isn't trivial. For example, it requires everyprocessor to be interrupted to invalidate some entries from its TLB.(This is a so-called "TLB shootdown".)

...  How can I measure the effectiveness
of the buffer cache?


I'm not sure that I can give you a short answer to this question.

The buffer cache size is also very tightly constrained (vfs.hibufspace
and vfs.lobufspace differ by 64KB) and at least one of the underlying
tuning parameters have comments at variance with current reality:
In<sys/param.h>:

  * MAXBSIZE -   Filesystems are made out of blocks of at most MAXBSIZE bytes
  *              per block.  MAXBSIZE may be made larger without effecting
...
  *
  * BKVASIZE -   Nominal buffer space per buffer, in bytes.  BKVASIZE is the
...
  *              The default is 16384, roughly 2x the block size used by a
  *              normal UFS filesystem.
  */
#define MAXBSIZE        65536   /* must be power of 2 */
#define BKVASIZE        16384   /* must be power of 2 */

There's no mention of the 64KiB limit in newfs(8) and I recall seeing
occasional comments from people who have either tried or suggested
trying larger blocksizes.


I believe that larger than 64KB would fail an assertion.

   Likewise, the default UFS blocksize has
been 16KiB for quite a while.  Are the comments still valid and, if so,
should BKVASIZE be doubled to 32768 and a suitable note added to newfs(8)
regarding the maximum block size?

If I recall correctly, increasing BKVASIZE would only reduce the numberbuffer headers. In other words, it might avoid wasting some memory onbuffer headers that won't be used.


Alan


_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: disk I/O, VFS hirunningspace

Reply via email to