On Thu, Dec 8, 2016 at 7:49 PM Michael Paquier <michael.paqu...@gmail.com> wrote: > On Fri, Dec 9, 2016 at 4:54 AM, Robert Haas <robertmh...@gmail.com> wrote: > > On Wed, Dec 7, 2016 at 11:20 PM, Michael Paquier > > <michael.paqu...@gmail.com> wrote: > >> OK, I rewrote a bit the patch as attached. What do you think? > > > > Committed and back-patched all the way back to 9.2. > > Thanks!
This thread resulted in commit fa0f466d5329e10b16f3b38c8eaf5306f7e234e8, and today I had cause to look at that commit again. The code is now in heapam_relation_set_new_filenode. I noticed a few things that seem like they are not quite right. 1. The comment added in that commit says "Recover may as well remove it while replaying..." but what is really meant is "Recovery may well remove it well replaying..." The phrase "may as well" means that there isn't really any reason not to do it even if it's not strictly necessary. The phrase "may well" means that it's entirely possible. The latter meaning is the one we want here. 2. The comment as adjusted in that commit refers to needing an immediate sync "even if the page has been logged, because the write did not go through shared_buffers," but there is no page and no write, because an empty heap is just an empty file. That logic makes sense for index AMs that bypass shared buffers to write a metapage, e.g. nbtree, as opposed to others which go through shared_buffers and don't have the immediate sync, e.g. brin. However, the heap writes no pages either through shared buffers or bypassing shared buffers, so either I'm confused or the comment makes no sense. 3. Before that commit, the comment said that "the immediate sync is required, because otherwise there's no guarantee that this will hit the disk before the next checkpoint moves the redo pointer." However, that justification seems to apply to *anything* that does smgrcreate + log_smgrcreate would also need to do smgrimmedsync, and RelationCreateStorage doesn't. Unless, for some reason that I'm not thinking of right now, the init fork has stronger durability requirements that the main fork, it's hard to understand why heapam_relation_set_new_filenode can call RelationCreateStorage to do smgrcreate + log_smgrcreate for the main fork and that's OK, but when it does the same thing itself for the init fork, it now needs smgrimmedsync as well. My guess, just shooting from the hip, is that the smgrimmedsync call can be removed here. If that's wrong, then we need a better explanation for why it's needed, and we possibly need to add it to every single place that does smgrcreate that doesn't have it already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company