On Mon, Nov 09, 2015 at 10:40:07PM +0100, Andres Freund wrote: > /* > * Optional array of WAL flush LSNs associated with entries in the SLRU > * pages. If not zero/NULL, we must flush WAL before writing pages > (true > * for pg_clog, false for multixact, pg_subtrans, pg_notify). > group_lsn[] > * has lsn_groups_per_page entries per buffer slot, each containing the > * highest LSN known for a contiguous group of SLRU entries on that > slot's > * page. > */ > XLogRecPtr *group_lsn; > int lsn_groups_per_page; > > Uhm. multixacts historically didn't need to follow the > write-WAL-before-data rule because it was zapped at restart. But it's > now persistent. > > There are no comments about this choice anywhere in multixact.c, leading > me to believe that this was not an intentional decision.
Here's the multixact.c comment justifying it: * XLOG interactions: this module generates an XLOG record whenever a new * OFFSETs or MEMBERs page is initialized to zeroes, as well as an XLOG record * whenever a new MultiXactId is defined. This allows us to completely * rebuild the data entered since the last checkpoint during XLOG replay. * Because this is possible, we need not follow the normal rule of * "write WAL before data"; the only correctness guarantee needed is that * we flush and sync all dirty OFFSETs and MEMBERs pages to disk before a * checkpoint is considered complete. If a page does make it to disk ahead * of corresponding WAL records, it will be forcibly zeroed before use anyway. * Therefore, we don't need to mark our pages with LSN information; we have * enough synchronization already. The comment's justification is incomplete, though. What of pages filled over the course of multiple checkpoint cycles? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers