On Mon, Nov 09, 2015 at 10:40:07PM +0100, Andres Freund wrote:
>       /*
>        * Optional array of WAL flush LSNs associated with entries in the SLRU
>        * pages.  If not zero/NULL, we must flush WAL before writing pages 
> (true
>        * for pg_clog, false for multixact, pg_subtrans, pg_notify).  
> group_lsn[]
>        * has lsn_groups_per_page entries per buffer slot, each containing the
>        * highest LSN known for a contiguous group of SLRU entries on that 
> slot's
>        * page.
>        */
>       XLogRecPtr *group_lsn;
>       int                     lsn_groups_per_page;
> 
> Uhm. multixacts historically didn't need to follow the
> write-WAL-before-data rule because it was zapped at restart. But it's
> now persistent.
> 
> There are no comments about this choice anywhere in multixact.c, leading
> me to believe that this was not an intentional decision.

Here's the multixact.c comment justifying it:

 * XLOG interactions: this module generates an XLOG record whenever a new
 * OFFSETs or MEMBERs page is initialized to zeroes, as well as an XLOG record
 * whenever a new MultiXactId is defined.  This allows us to completely
 * rebuild the data entered since the last checkpoint during XLOG replay.
 * Because this is possible, we need not follow the normal rule of
 * "write WAL before data"; the only correctness guarantee needed is that
 * we flush and sync all dirty OFFSETs and MEMBERs pages to disk before a
 * checkpoint is considered complete.  If a page does make it to disk ahead
 * of corresponding WAL records, it will be forcibly zeroed before use anyway.
 * Therefore, we don't need to mark our pages with LSN information; we have
 * enough synchronization already.

The comment's justification is incomplete, though.  What of pages filled over
the course of multiple checkpoint cycles?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to