On Mon, Mar 3, 2014 at 8:33 AM, Andres Freund <and...@2ndquadrant.com> wrote: > On 2014-03-03 06:57:00 -0500, Robert Haas wrote: >> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <and...@2ndquadrant.com> wrote: >> > I don't think this is neccessary >= 9.2. The are two only "interestings" >> > place >> > where PD_ALL_VISIBLE is set: >> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before* >> > PD_ALL_VISIBLE/the vm is touched and that causes recovery >> > conflicts. The heap page is locked for cleanup at that point. As the >> > logging of xl_heap_clean sets the page's LSN there's no way the page >> > can appear on the standby too early. >> > b) empty pages in lazy_scan_heap(). If they always were empty, there's >> > no need for conflicts. The only other way I can see to end up there >> > is a previous heap_page_prune() that repaired fragmentation. But that >> > logs a WAL record with conflict information. >> >> I don't think there's any reason to believe that lazy_scan_heap() can >> only hit pages that are empty or have just been defragged. Suppose >> that there's a tuple on the page which was recently inserted; the >> inserting transaction has committed but there are some backends that >> still have older snapshots. The page won't be marked all-visible >> because it isn't. Now, eventually those older snapshots will go away, >> and sometime after that the relation will get vacuumed again, and >> we'll once again look the page. But this time we notice that it is >> all-visible, and mark it so. > > Right now I am missing how this isn't an actual correctness problem > after a crash. Without an LSN interlock we could crash *after* the heap > page has been written out, but *before* the vm WAL record has been > flushed to disk.
Yes. In that case, the PD_ALL_VISIBLE bit will be set on the page, but the corresponding visibility map bit will be unset. > Combined with synchronous_commit=off there could be > transactions that appeared as safely committed for vacuum (i.e. are > below GetOldestXmin()), but which are actually aborted after the > commit. > Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN), > but that doesn't work here. Well, we'd better not try to mark a page all-visible if it's only all-visible on the assumption that unwritten xlog will be successfully flushed to disk. But lazy_scan_heap() has code that only regards the tuple as all-visible once the tuple is hinted committed, and there's code elsewhere to keep hint bits from being set too early. And heap_page_is_all_visible() follows the same pattern. So I think it's OK, but maybe you see something I don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers