Kevin Brown <[EMAIL PROTECTED]> writes: > One question I have is: in the event of a crash, why not simply replay > all the transactions found in the WAL? Is the startup time of the > database that badly affected if pg_control is ignored?
Interesting thought, indeed. Since we truncate the WAL after each checkpoint, seems like this approach would no more than double the time for restart. The win is it'd eliminate pg_control as a single point of failure. It's always bothered me that we have to update pg_control on every checkpoint --- it should be a write-pretty-darn-seldom file, considering how critical it is. I think we'd have to make some changes in the code for deleting old WAL segments --- right now it's not careful to delete them in order. But surely that can be coped with. OTOH, this might just move the locus for fatal failures out of pg_control and into the OS' algorithms for writing directory updates. We would have no cross-check that the set of WAL file names visible in pg_xlog is sensible or aligned with the true state of the datafile area. We'd have to take it on faith that we should replay the visible files in their name order. This might mean we'd have to abandon the current hack of recycling xlog segments by renaming them --- which would be a nontrivial performance hit. Comments anyone? > If there exists somewhere a reasonably succinct description of the > reasoning behind the current transaction management scheme (including > an analysis of the pros and cons), I'd love to read it and quit > bugging you. :-) Not that I know of. Would you care to prepare such a writeup? There is a lot of material in the source-code comments, but no coherent presentation. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly