On Tue, May 21, 2019 at 4:17 AM Michael Paquier <mich...@paquier.xyz> wrote: > > 2. Suppose the system reaches the end of > > heapam_relation_set_new_filenode and then the system crashes. Because > > of the smgrimmedsync(), and only because of the smgrimmedsync(), the > > init fork would exist at the start of recovery. However, unlike the > > heap, some of the index AMs don't actually have a call to > > smgrimmedsync() in their *buildempty() functions. If I'm not > > mistaken, the ones that write pages through shared_buffers do not do > > smgrimmedsync, and the ones that use private buffers do perform > > smgrimmedsync. > > Yes, that maps with what I can see in the code for the various empty() > callbacks. > > > Therefore, the ones that write pages through > > shared_buffers are vulnerable to a crash right after the unlogged > > table is created. The init fork could fail to survive the crash, and > > therefore the start-of-recovery code would again fail to know that > > it's dealign with an unlogged relation. > > Taking the example of gist which uses shared buffers for the init > fork logging, we take an exclusive lock on the buffer involved while > logging the init fork, meaning that the checkpoint is not able to > complete until the lock is released and the buffer is flushed. Do you > have a specific sequence in mind?
Yes. I thought I had described it. You create an unlogged table, with an index of a type that does not smgrimmedsync(), your transaction commits, and then the system crashes, losing the _init fork for the index. There's no checkpoint involved in this scenario, so any argument that it can't be a problem because of checkpoints is necessarily incorrect. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company