"Jacky Leng" <[EMAIL PROTECTED]> writes: > Shouldn't we write xlog record before we do a physical operation?
The reasoning for not doing it that way was that we can't be sure beforehand that the filesystem operation will succeed. If we xlog the truncate first, it fails, and then we crash, we're in deep trouble because WAL replay will try to do the truncate and likewise fail, preventing the system from restarting. Other non-rollbackable filesystem ops (I think just CREATE/DROP DATABASE/TABLESPACE) are done the same way. CREATE DATABASE would be particularly nasty to reverse the order for, since there are obvious cases like out-of-disk-space that will make it fail. > An test case: > 1. set full_page_writes off; > 2. startup database; create a table; insert 100000 rows in it; shutdown > database; > 3. startup database again; delete all rows from this table; > 4. vacuum this table, and it will come into smgrtruncate; kill postmaster > before smgrtruncate do xlog stuff(set a breakpoint before xlog stuff); > 5. startup database the 3rd time, during the recovery, the database will > crash with: > PANIC: WAL contains references to invalid pages Hmm. Maybe we need something like xlog a "tentative truncate", do it, xlog "real truncate"? The tentative truncate would merely tell replay not to be surprised if those blocks aren't there anymore. Seems a bit grotty though. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq