Sean Chittenden wrote: > > Basically, we don't know when we read a buffer whether this is a > > read-only or read/write. In fact, we could read it in, and another > > backend could write it for us. > > Um, wait. The cache is shared between backends? I don't think so, > but it shouldn't matter because there has to be a semaphore locking > the cache to prevent the coherency issue you describe. If PostgreSQL > didn't, it'd be having problems with this now. I'd also think that > MVCC would handle the case of updated data in the cache as that has to > be a common case. At what point is the cached result invalidated and > fetched from the OS?
Uh, it's called the _shared_ buffer cache in postgresql.conf, and we lock pages only while we are reading/writing them, not for the duration they are in the cache. > > The big issue is that when we do a write, we don't wait for it to > > get to disk. > > Only in the case when fsync() is turned off, but again, that's up to > the OS to manage that can of worms, which I think BSD takes care of > that. From conf/NOTES: Nope. When you don't have a kernel buffer cache, and you do a write, where do you expect it to go? I assume it goes to the drive, and you have to wait for that. > > # Attempt to bypass the buffer cache and put data directly into the > # userland buffer for read operation when O_DIRECT flag is set on the > # file. Both offset and length of the read operation must be > # multiples of the physical media sector size. > # > #options DIRECTIO > > The offsets and length bit kinda bothers me though, but I thin that's > stuff that the ernel has to take into account, not the userland calls, > I wonder if that's actually accurate any more or affects userland > calls... seems like that'd be a bit too much work to have the user > do, esp given the lack of documentation on the flag... should be just > drop in additional flag, afaict. > > > It seems to use O_DIRECT, we would have to read the buffer in a > > special way to mark it as read-only, which seems kind of strange. I > > see no reason we can't use free-behind in the PostgreSQL buffer > > cache to handle most of the benefits of O_DIRECT, without the > > read-only buffer restriction. > > I don't see how this'd be an issue as buffers populated via a read(), > that are updated, and then written out, would occupy a new chunk of > disk to satisfy MVCC. Why would we need to mark a buffer as read only > and carry around/check its state? We update the expired flags on the tuple during update/delete. -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly