Le Saturday 25 October 2008, Bruce Evans a écrit : > On Fri, 24 Oct 2008, Thierry Herbelot wrote: > > the [SUBJ] file contains the following extract (around line 705) : > > > > * Default to PAGE_SIZE after much discussion. > > * XXX: min(PAGE_SIZE, vp->v_bufobj.bo_bsize) may be more correct. > > */ > > > > sb->st_blksize = PAGE_SIZE; > > > > which arrived around four years ago, with revision 1.211 (see > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_vnops.c.diff?r1=1. > >210;r2=1.211;f=h) > > Indeed, this was completely broken long ago (in 1.211). Before then, and > after 1.128, some cases worked as intended if not perfectly: > - regular files: file systems still set va_blksize to their idea of the > best i/o size (normally to the file system block size, which is > normally larger than PAGE_SIZE and probably better in all cases) and > this was used here. However, for regular files, the fs block size > and the application's i/o size are almost irrelevant in most cases > due to vfs clustering. Most large i/o's are done physically with > the cluster size (which due to a related bug suite ends up being > hard-coded to MAXPHYS (128K) at a minor cost when this is different > from the best size). > - disk files: non-broken device drivers set si_iosize_best to their idea > of the best i/o size (normally to the max i/o size, which is normally > better than PAGE_SIZE) and this was used here. The bogus default > of BLKDEV_IOSIZE was used for broken drivers (this is bogus because it > was for the buffer cache implementation for block devices which no > longer exist and was too small for them anyway). > - non-disk character-special files: the default of PAGE_SIZE was used. > The comment about defaulting to PAGE_SIZE was added in 1.128 and is > mainly for this case. Now the comment is nonsense since the value is > fixed, not a default. > - other file types (fifos, pipes, sockets, ...): these got the default of > PAGE_SIZE too. > > In rev.1.1, st_blksize was set to va_blksize in all cases. So file systems > were supposed to set va_blksize reasonably in all cases, but this is not > easy and they did nothing good except for regular files.
agreed, anyway the comment by phk about using ioctl(DIOCGSECTORSIZE) applies. > > Versions between 1.2 and 1.127 did weird things like defaulting to DFLTPHYS > (64K) for most cdevs but using a small size like BLKDEV_IOSIZE (2K) for > disks. This gave nonsense like 64K buffers for slow tty devices (keyboards) > and 2K buffers for fast disks. At least for programs that trust st_blksize > o be reasonable. Fortunately, st_blsize is rarely used... > > > the net effect of this change is to decrease the block buffer size used > > in libc/stdio from 16 kbytes (derived from the underlying ufs partition) > > to PAGE_SIZE ==4 kbytes (fixed value), and consequently the I/O bandwidth > > is lowered (this is on a slow Flash). > > ... except it is used by stdio. (Another mess here is that stdio mostly > doesn't use its own BUFSIZ. It trusts st_blksize if fstat() to determine This is indeed what I saw, meandering between the libc and the vfs part of the kernel. In fact, I was essentially wondering if st_blksize was used *elsewhere*, and bumping the value could break some memory allocation ... > st_blksize works. Of course, the existence of BUFSIZ is a related > historical mistake -- no fixed size can work best for all cases. But > when BUFSIZ is used, it is an even worse default than PAGE_SIZE.) (as it is even smaller ?) > > It's interesting that you can see the difference. Clustering is especially > good for hiding slowness on slow devices. Maybe you are using a > configuration that makes clustering ineffective. Mounting the file system > with -o sync or equivalently, doing a sync after every (too-small) write > would do it. Otherwise, writes are normally delated until the next cluster > boundary. My use case is for small (buffered) writes to a file between 4 kbytes and 16 16 kbytes. For example, writing a 16-kbyte file with a st_blksize of 4k is twice as slow as with 16k (220 ms compared to 110). The penalty is less for 8k-byte (105 ms vs 66). > > > I have patched the kernel with a larger, fixed value (simply 4*PAGE_SIZE, > > to revert to the block size previoulsly used), and the kernel and world > > seem to be running fine. > > > > Seeing the XXX coment above, I'm a bit worried about keeping this new > > st_blksize value. > > > > are there any drawbacks with running with this bigger buffer size value ? > > Mostly it doesn't matter, since buffering (clustering) hides the > differences. (as seen before, mostly) > Without clustering, 16K is a much better default for disks > than 4K, though not as good as the non-default va_blksize for regular > files. Newer disks might prefer 32K or 64k, but then the fs block size > should also be increased from 16K. Otherwise, increasing the block size > usually reduces performance, by thrashing caches or increasing latencies. > With modern cache sizes and disk speeds, you won't see these effects for a > block size of 64K, so defaulting to 64K would be reasonable for disks. It > would be silly for keyboards, but with modern memory sizes you would notice > this even less than when it was that in old versions. OK, thanks for the answer : I will submit the change to more stress tests and hope to shake it all before putting it to production. TfH > > Bruce _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"