Hello. At Sun, 12 May 2019 17:37:05 -0700, Noah Misch <n...@leadboat.com> wrote in <20190513003705.ga1202...@rfd.leadboat.com> > On Sun, Mar 31, 2019 at 03:31:58PM -0700, Noah Misch wrote: > > On Sun, Mar 10, 2019 at 07:27:08PM -0700, Noah Misch wrote: > > > I also liked the design in the https://postgr.es/m/559fa0ba.3080...@iki.fi > > > last paragraph, and I suspect it would have been no harder to back-patch. > > > I > > > wonder if it would have been simpler and better, but I'm not asking > > > anyone to > > > investigate that. > > > > Now I am asking for that. Would anyone like to try implementing that other > > design, to see how much simpler it would be?
Yeah, I think it is a bit too-complex for the value. But I think it is the best way as far as we keep reusing a file on truncation of the whole file. > Anyone? I've been deferring review of v10 and v11 in hopes of seeing the > above-described patch first. The siginificant portion of the complexity in this patch comes from need to behave differently per block according to remebered logged and truncated block numbers. 0005: + * NB: after WAL-logging has been skipped for a block, we must not WAL-log + * any subsequent actions on the same block either. Replaying the WAL record + * of the subsequent action might fail otherwise, as the "before" state of + * the block might not match, as the earlier actions were not WAL-logged. + * Likewise, after we have WAL-logged an operation for a block, we must + * WAL-log any subsequent operations on the same page as well. Replaying + * a possible full-page-image from the earlier WAL record would otherwise + * revert the page to the old state, even if we sync the relation at end + * of transaction. + * + * If a relation is truncated (without creating a new relfilenode), and we + * emit a WAL record of the truncation, we can't skip WAL-logging for any + * of the truncated blocks anymore, as replaying the truncation record will + * destroy all the data inserted after that. But if we have already decided + * to skip WAL-logging changes to a relation, and the relation is truncated, + * we don't need to WAL-log the truncation either. If this consideration holds and given the optimizations on WAL-skip and truncation, there's no way to avoid the per-block behavior as far as we are allowing mixture of logged-modifications and WAL-skipped COPY on the same relation within a transaction. We could avoid the per-block behavior change by making the wal-inhibition per-relation basis. That will reduce the patch size by the amount of BufferNeedsWALs and log_heap_update, but not that large. inhibit wal-skipping after any wal-logged modifications in the relation. inhibit wal-logging after any wal-skipped modifications in the relation. wal-skipped relations are synced at commit-time. truncation of wal-skipped relation creates a new relfilenode. regards. -- Kyotaro Horiguchi NTT Open Source Software Center