Sailesh Krishnamurthy <[EMAIL PROTECTED]> writes: > (Just a note: my comments are not pg-specific .. indeed I don't know > much about pg recovery). > ... > BTW, logging raw datafile blocks would be pretty gross (physical > logging) and so ARIES logs the changes to each tuple in "logical" > fashion .. so if only one column changes only that value (before and > after images) are logged. This is what's called "physiological > logging".
What PG currently uses is sort of a hybrid approach. The basic WAL entry normally gives tuple-level detail: a tuple image and location for an INSERT, a new tuple image and old and new locations for UPDATE, a tuple location for DELETE, etc. This might be a bit bulkier than necessary for UPDATEs that change few columns, but it's consistent and easy to replay. However, as I mentioned to Marty, the very first change to any disk page after a checkpoint will dump an entire (post-change) image of that page into the log, in place of the tuple image or other detail that would allow us to redo the change to that page. The purpose of this is to ensure that if a page is only partially written during a power failure, we have the ability to recreate the entire correct page contents from WAL by replaying everything since the last checkpoint. Replay is thus a write-only operation as far as the data files are concerned --- we only write full pages or modify previously written pages. (You may be thinking "well, what about a partial write to WAL" --- but if that happens, that's where we stop replaying WAL. The checksums on WAL entries allow us to detect the invalid data before we use it. So we will still have a consistent valid state on disk, showing the effects of everything up through the last undamaged WAL record.) BTW this is a one-directional log. We do not care about UNDO, only replay. There is nothing that need be undone from a failed transaction; it can only have littered the database with dead tuples, which will eventually be reclaimed by VACUUM. > Having said that, page-oriented undo logging can be a pain when > B-tree pages split. We don't bother to undo index page splits either ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html