On 2019-May-13, Andres Freund wrote: > On 2019-05-13 13:07:30 -0400, Alvaro Herrera wrote: > > On 2019-May-13, Andres Freund wrote:
> > The first ResetUnloggedRelations call occurs before any WAL is replayed, > > so the data dir certainly still in inconsistent state. At that point, > > we need the init fork files to be present, because the init files are the > > indicators of what relations we need to delete the other forks for. > > Hm. I think this might be a self-made problem. For the main fork, we > don't need this - if the init fork was created before the last > checkpoint/restartpoint, it'll be on-disk. If it was created afterwards, > WAL replay will recreate both main an init fork. So the problem is just > that the VM fork might survive, because it'll not get nuked given the > current arrangement. Is that what you're thinking about? No, this wasn't was I was trying to explain. Robert described it better. > > Maybe we can do something lighter than a full immedsync of all the data > > for the init file -- it would be sufficient to have the file *exist* -- > > but I'm not sure this optimization is worth anything. > > I don't think just that is sufficient in isolation for types of > relations with metapages (e.g. btree) - the init fork constains data > there. No, I meant that when doing the initial cleanup (before WAL replay) we only delete files; and for that we only need to know whether the table is unlogged, and we know that by testing presence of the init file. We do need the contents *after* WAL replay, and for indexes we of course need the actual contents of the init fork. > > > Well, otherwise the relation won't exist on a standby? And if replay > > > starts from before a database/tablespace creation we'd remove the init > > > fork. So if it's not in the WAL, we'd loose it. > > > > Ah, of course. Well, that needs to be in the comments then. > > I think it is? > > * ... Recovery may as well remove it > * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE > * record. Therefore, logging is necessary even if wal_level=minimal. I meant the standby part. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services