Actually, this brings up a different point. We use 8k blocks now because at the time PostgreSQL was developed, it used BSD file systems, and those prefer 8k blocks, and there was some concept that an 8k write was atomic, though with 512 byte disk blocks, that was incorrect. (We knew that at the time too, but we didn't have any options, so we just hoped.)
In fact, we now write pre-modified pages to WAL specifically because we can't be sure an 8k page write to disk will be atomic. Part of the page may make it to disk, and part may not. Now, with larger RAM and disk sizes, it may be time to consider larger page sizes, like 32k pages. That reduces the granularity of the cache, but it may have other performance advantages that would be worth it. What people are actually suggesting with the read-ahead for sequential scans is basically a larger block size for sequential scans than for index scans. While this makes sense, it may be better to just increase the block size overall. --------------------------------------------------------------------------- Curt Sampson wrote: > On Wed, 24 Apr 2002, Michael Loftis wrote: > > > A Block-sized read will not be rboken up. But if you're reading ina > > size bigger than the underlying systems block sizes then it can get > > broken up. > > In which operating systems, and under what circumstances? > > I'll agree that some OSs may not coalesce adjacent reads into a > single read, but even so, submitting a bunch of single reads for > consecutive blocks is going to be much, much faster than if other, > random I/O occured between those reads. > > > If the underlying > > block size is 8KB and you dump 4MB down on it, the OS may (and in many > > cases does) decide to write part of it, do a read ona nearby sector, > > then write the rest. This happens when doing long writes that end up > > spanning block groups because the inodes must be allocated. > > Um...we're talking about 64K vs 8K reads here, not 4 MB reads. I am > certainly not suggesting Posgres ever submit 4 MB read requests to the OS. > > I agree that any single-chunk reads or writes that cause non-adjacent > disk blocks to be accessed may be broken up. But in my sense, > they're "broken up" anyway, in that you have no choice but to take > a performance hit. > > > Further large read requests can of course be re-ordered by hardware. > > ...The OS also tags ICP, which can be re-ordered on block-sized chunks. > > Right. All the more reason to read in larger chunks when we know what we > need in advance, because that will give the OS, controllers, etc. more > advance information, and let them do the reads more efficiently. > > cjs > -- > Curt Sampson <[EMAIL PROTECTED]> +81 90 7737 2974 http://www.netbsd.org > Don't you know, in this new Dark Age, we're all light. --XTC > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly