Hello. At Mon, 19 Aug 2019 23:03:14 -0700, Noah Misch <n...@leadboat.com> wrote in <20190820060314.ga3086...@rfd.leadboat.com> > On Mon, Aug 19, 2019 at 06:59:59PM +0900, Kyotaro Horiguchi wrote: > > At Sat, 17 Aug 2019 20:52:30 -0700, Noah Misch <n...@leadboat.com> wrote in > > <20190818035230.gb3021...@rfd.leadboat.com> > > > For two-phase commit, PrepareTransaction() needs to execute pending syncs. > > > > Now TwoPhaseFileHeader has two new members for (commit-time) > > pending syncs. Pending-syncs are useless on wal-replay, but that > > is needed for commit-prepared. > > There's no need to modify TwoPhaseFileHeader or the COMMIT PREPARED sql > command, which is far too late to be syncing new relation files. (A crash may > have already destroyed their data.) PrepareTransaction(), which implements > the PREPARE TRANSACTION command, is the right place for these syncs. > > A failure in these new syncs needs to prevent the transaction from being > marked committed. Hence, in CommitTransaction(), these new syncs need to
Agreed. > happen after the last step that could create assign a new relfilenode and > before RecordTransactionCommit(). I suspect it's best to do it after > PreCommit_on_commit_actions() and before AtEOXact_LargeObject(). I don't find an obvious problem there. Since pending deletes and pending syncs are separately processed, I'm planning to make a separate list for syncs from deletes. > > > This should sync all forks, not just MAIN_FORKNUM. Code that writes WAL > > > for > > > FSM_FORKNUM and VISIBILITYMAP_FORKNUM checks RelationNeedsWAL(). There > > > may be > > > no bug today, but it's conceptually wrong to make RelationNeedsWAL() > > > return > > > false due to this code, use RelationNeedsWAL() for multiple forks, and > > > then > > > not actually sync all forks. > > > > I agree that all forks needs syncing, but FSM and VM are checking > > RelationNeedsWAL(modified). To make sure, are you suggesting to > > sync all forks instead of emitting WAL for them, or suggesting > > that VM and FSM to emit WALs even when the modified > > RelationNeedsWAL returns false (+ sync all forks)? > > I hadn't thought that far. What do you think is best? As in the latest patch, sync ALL forks then no WALs. We could skip syncing FSM but I'm not sure it's work doing. > > > The https://postgr.es/m/559fa0ba.3080...@iki.fi design had another > > > component > > > not appearing here. It said, "Instead, at COMMIT, we'd fsync() the > > > relation, > > > or if it's smaller than some threshold, WAL-log the contents of the whole > > > file > > > at that point." Please write the part to WAL-log the contents of small > > > files > > > instead of syncing them. > > > > I'm not sure the point of the behavior. I suppose that the "log" > > is a sequence of new_page records. It also needs to be synced and > > it is always larger than the file to be synced. I can't think of > > an appropriate threshold without the point. > > Yes, it would be a sequence of new-page records. FlushRelationBuffers() locks > every buffer header containing a buffer of the current database. The belief > has been that writing one page to xlog is cheaper than FlushRelationBuffers() > in a busy system with large shared_buffers. I'm at a loss.. The decision between WAL and sync is made at commit time, when we no longer have a pin on a buffer. When emitting WAL, opposite to the assumption, lock needs to be re-acquired for every page to emit log_new_page. What is worse, we may need to reload evicted buffers. If the file has been CopyFrom'ed, ring buffer strategy makes the situnation farther worse. That doesn't seem cheap at all.. If there were any chance on WAL for smaller files here, it would be on the files smaller than the ring size of bulk-write strategy(16MB). If we pick up every buffer page of the file instead of scanning through all buffers, that makes things worse by conflicts on partition locks. Any thoughts? # Sorry time's up today. regards. -- Kyotaro Horiguchi NTT Open Source Software Center