On Thu, Mar 11, 2021 at 2:00 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote: > On 2021/03/11 8:30, Thomas Munro wrote: > > I've run into a couple of users who have just commented that recursive > > fsync() code out! > > BTW, we can skip that recursive fsync() by disabling fsync GUC even without > commenting out the code?
Those users wanted fsync=on because they wanted to recover to a normal online system after a crash, but they believed that the preceding fsync of the data directory was useless, because replaying the WAL should be enough. IMHO they were nearly on the right track, and the prototype patch I linked earlier as [2] was my attempt to find the specific reasons why that doesn't work and fix them. So far, I figured out that you still have to remember to fsync the WAL files (otherwise you're replaying WAL that potentially hasn't reached the disk), and data files holding blocks that recovery decided to skip due to BLK_DONE (otherwise you might decide to skip replay because of a higher LSN that is on a page that is in the kernel's cache but not yet on disk).