On Thu, Jun 23, 2011 at 6:40 PM, Jeff Davis <pg...@j-davis.com> wrote: > On Thu, 2011-06-23 at 18:18 -0400, Robert Haas wrote: >> Lazy VACUUM is the only thing that makes a page all visible. I don't >> understand the part about snapshots. > > Lazy VACUUM is the only thing that _marks_ a page with PD_ALL_VISIBLE. > > After an INSERT to a new page, and after all snapshots are released, the > page becomes all-visible; and thus subject to being marked with > PD_ALL_VISIBLE by lazy vacuum without bumping the LSN. Note that there > is no cleanup action that takes place here, so nothing else will bump > the LSN either. > > So, let's say that we hypothetically had persistent snapshots, then > you'd have the following problem: > > 1. INSERT to a new page, marking it with LSN X > 2. WAL flushed to LSN Y (Y > X) > 2. Some persistent snapshot (that doesn't see the INSERT) is released, > and generates WAL recording that fact with LSN Z (Z > Y) > 3. Lazy VACUUM marks the newly all-visible page with PD_ALL_VISIBLE > 4. page is written out because LSN is still X > 5. crash > > Now, the persistent snapshot is still present because LSN Z never made > it to disk; but the page is marked with PD_ALL_VISIBLE. > > Sure, if these hypothetical persistent snapshots were transactional, and > if synchronous_commit is on, then LSN Z would be flushed before step 3; > but that's another set of assumptions. That's why I left it simple and > said that the assumption was "snapshots are released if there's a > crash".
I don't really think that's a separate set of assumptions - if we had some system whereby snapshots could survive a crash, then they'd have to be WAL-logged (because that's how we make things survive crashes). And anything that is WAL-logged must obey the WAL-before-data rule. We have a system already that ensures that when synchronous_commit=off, CLOG pages can't be flushed before the corresponding WAL record makes it to disk. For a system like what you're describing, you'd need something similar - these crash-surviving snapshots would have to make sure that no action which depended on their state hit the disk before the WAL record marking the state change hit the disk. I guess the point you are driving at here is that a page can only go from being all-visible to not-all-visible by virtue of being modified. There's no other piece of state (like a persistent snapshot) that can be lost as part of a crash that would make us need change our mind and decide that an all-visible XID is really not all-visible after all. (The reverse is not true: since snapshots are ephemeral, a crash will render every row either all-visible or dead.) I guess I never thought about documenting that particular aspect of it because (to me) it seems fairly self-evident. Maybe I'm wrong... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers