On Tue, Mar 2, 2021 at 5:28 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Tue, Feb 2, 2021 at 11:16 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > > Right, the checkpoint itself is probably worse than this > > "close-all-your-files!" thing in some cases [...] > > I've been wondering what obscure hazards these "tombstone" (for want > of a better word) files guard against, besides the one described in > the comments for mdunlink(). I've been thinking about various > schemes that can be summarised as "put the tombstones somewhere else", > but first... this is probably a stupid question, but what would break > if we just ... turned all this stuff off when wal_level is high enough > (as it is by default)? > > [0001-Make-relfile-tombstone-files-conditional-on-WAL-leve.not-for-cfbot-patch]
I had the opportunity to ask the inventor of UNLOGGED TABLEs, who answered my question with another question, something like, "yeah, but what about UNLOGGED TABLEs?". It seems to me that any schedule where a relfilenode is recycled should be recovered correctly, no matter what sequence of persistence levels is involved. If you dropped an UNLOGGED table, then its init fork is removed on commit, so a permanent table created later with the same relfilenode has no init fork and no data is eaten; the other way around you get an init fork, and your table is reset on crash recovery, as it should be. It works because we still log and replay the create/drop; it doesn't matter that we don't log the table's data as far as I can see so far.