On Thu, Aug 9, 2012 at 12:43 PM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: >> So suppose that the following sequence of events occurs: >> >> 1. Tuple A on page 1 is updated. The new version, tuple B, is placed on >> page 2. >> 2. The table is vacuumed, removing tuple A. >> 3. Page 1 is written durably to disk. >> 4. Crash. >> >> If reconstructing tuple B requires possession of tuple A, it seems >> that we are now screwed. > > Not with full_page_writes=on, as crash recovery will restore the old page > contents. But you're right, with full_page_writes=off you are screwed.
I think the property that recovery only needs to worry about each block individually is one that we want to preserve. Supporting this optimizating only when full_page_writes=off seems ugly, and I also agree with Simon's objection upthread: the current design minimizes the chances of corruption propagating from block to block. Even if the proposed design is bullet-proof as of this moment (at least with full_page_writes=on) it seems very possible that it could get accidentally broken by future code changes, leading to hard-to-find data corruption bugs. It might also complicate other things that we will want to do down the line, like parallelizing recovery. In the pgbench testing I've done, almost all of the updates are HOT, provided you run the test long enough to reach steady state, so restricting this optimization to HOT updates shouldn't hurt that case (or similar real-world cases) very much. Of course there are probably also real-world cases where HOT applies only seldom, and those cases won't get the benefit of this, but you can't win them all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers