Hi, On Tue, 16 Jul 2024 at 15:19, Noah Misch <n...@leadboat.com> wrote: > > On Tue, Jul 16, 2024 at 02:11:20PM +0300, Nazir Bilal Yavuz wrote: > > On Fri, 12 Jul 2024 at 02:52, Noah Misch <n...@leadboat.com> wrote: > > > On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote: > > > > --- a/src/backend/storage/aio/read_stream.c > > > > +++ b/src/backend/storage/aio/read_stream.c > > > > @@ -549,7 +549,7 @@ read_stream_begin_relation(int flags, > > > > { > > > > stream->ios[i].op.rel = rel; > > > > stream->ios[i].op.smgr = RelationGetSmgr(rel); > > > > - stream->ios[i].op.smgr_persistence = 0; > > > > + stream->ios[i].op.smgr_persistence = > > > > rel->rd_rel->relpersistence; > > > > > > Does the following comment in ReadBuffersOperation need an update? > > > > > > /* > > > * The following members should be set by the caller. If only > > > smgr is > > > * provided without rel, then smgr_persistence can be set to > > > override the > > > * default assumption of RELPERSISTENCE_PERMANENT. > > > */ > > > > I believe it does not need to be updated but I renamed > > 'ReadBuffersOperation.smgr_persistence' as > > 'ReadBuffersOperation.persistence'. So, this comment is updated as > > well. I think that rename suits better because persistence does not > > need to come from smgr, it could come from relation, too. Do you think > > it is a good idea? If it is, does it need a separate commit? > > The rename is good. I think the comment implies "persistence" is unused when > rel!=NULL. That implication is true before the patch but false after the > patch.
What makes it false after the patch? I think the logic did not change. If there is rel, the value of persistence is obtained from 'rel->rd_rel->relpersistence'. If there is no rel, then smgr is used to obtain its value. > > > > @@ -4667,19 +4698,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator > > > > srclocator, > > > > > > > /* Iterate over each block of the source relation file. */ > > > > for (blkno = 0; blkno < nblocks; blkno++) > > > > { > > > > CHECK_FOR_INTERRUPTS(); > > > > > > > > /* Read block from source relation. */ > > > > - srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, > > > > blkno, > > > > - > > > > RBM_NORMAL, bstrategy_src, > > > > - > > > > permanent); > > > > + srcBuf = read_stream_next_buffer(src_stream, NULL); > > > > LockBuffer(srcBuf, BUFFER_LOCK_SHARE); > > > > > > I think this should check for read_stream_next_buffer() returning > > > InvalidBuffer. pg_prewarm doesn't, but the other callers do, and I think > > > the > > > other callers are a better model. LockBuffer() doesn't check the > > > InvalidBuffer case, so let's avoid the style of using a > > > read_stream_next_buffer() return value without checking. > > > > There is an assert in the LockBuffer which checks for the > > InvalidBuffer. If that is not enough, we may add an if check for > > InvalidBuffer but what should we do in this case? It should not > > happen, so erroring out may be a good idea. > > I like this style from read_stream_reset(): > > while ((buffer = read_stream_next_buffer(stream, NULL)) != > InvalidBuffer) > { > ... > } > > That is, don't iterate over block numbers. Drain the stream until empty. If > the stream returns a number of blocks higher or lower than we expected, we > won't detect that, and that's okay. It's not a strong preference, so I'm open > to arguments against that from you or others. A counterargument could be that > read_stream_reset() doesn't know the buffer count, so it has no choice. The > counterargument could say that callers knowing the block count should use the > pg_prewarm() style, and others should use the read_stream_reset() style. I think what you said in the counter argument makes sense. Also, there is an 'Assert(read_stream_next_buffer(src_stream, NULL) == InvalidBuffer);' after the loop. Which means all the blocks in the stream are read and there is no block left. v3 is attached. The only change is 'read_stream.c' changes in the 0003 are moved to 0002. -- Regards, Nazir Bilal Yavuz Microsoft
From 977cea9451e602d8d3d5e4f0cf3cd7dfea11879e Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz <byavu...@gmail.com> Date: Sun, 7 Apr 2024 22:33:36 +0300 Subject: [PATCH v3 1/3] Refactor PinBufferForBlock() to remove if checks about persistence There are if checks in PinBufferForBlock() function to set persistence of the relation and this function is called for the each block in the relation. Instead of that, set persistence of the relation before PinBufferForBlock() function. --- src/include/storage/bufmgr.h | 4 ++-- src/backend/storage/aio/read_stream.c | 2 +- src/backend/storage/buffer/bufmgr.c | 32 +++++++++++++-------------- 3 files changed, 18 insertions(+), 20 deletions(-) diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h index a1e71013d32..ba9a1a289a9 100644 --- a/src/include/storage/bufmgr.h +++ b/src/include/storage/bufmgr.h @@ -116,12 +116,12 @@ struct ReadBuffersOperation { /* * The following members should be set by the caller. If only smgr is - * provided without rel, then smgr_persistence can be set to override the + * provided without rel, then persistence can be set to override the * default assumption of RELPERSISTENCE_PERMANENT. */ Relation rel; struct SMgrRelationData *smgr; - char smgr_persistence; + char persistence; ForkNumber forknum; BufferAccessStrategy strategy; diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index 74b9bae6313..58221649f27 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -551,7 +551,7 @@ read_stream_begin_relation(int flags, { stream->ios[i].op.rel = rel; stream->ios[i].op.smgr = RelationGetSmgr(rel); - stream->ios[i].op.smgr_persistence = 0; + stream->ios[i].op.persistence = rel->rd_rel->relpersistence; stream->ios[i].op.forknum = forknum; stream->ios[i].op.strategy = strategy; } diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 61816730955..7cbcdc63a6f 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1104,7 +1104,7 @@ ZeroAndLockBuffer(Buffer buffer, ReadBufferMode mode, bool already_valid) static pg_attribute_always_inline Buffer PinBufferForBlock(Relation rel, SMgrRelation smgr, - char smgr_persistence, + char persistence, ForkNumber forkNum, BlockNumber blockNum, BufferAccessStrategy strategy, @@ -1113,22 +1113,12 @@ PinBufferForBlock(Relation rel, BufferDesc *bufHdr; IOContext io_context; IOObject io_object; - char persistence; Assert(blockNum != P_NEW); - /* - * If there is no Relation it usually implies recovery and thus permanent, - * but we take an argument because CreateAndCopyRelationData can reach us - * with only an SMgrRelation for an unlogged relation that we don't want - * to flag with BM_PERMANENT. - */ - if (rel) - persistence = rel->rd_rel->relpersistence; - else if (smgr_persistence == 0) - persistence = RELPERSISTENCE_PERMANENT; - else - persistence = smgr_persistence; + Assert((persistence == RELPERSISTENCE_TEMP || + persistence == RELPERSISTENCE_PERMANENT || + persistence == RELPERSISTENCE_UNLOGGED)); if (persistence == RELPERSISTENCE_TEMP) { @@ -1203,6 +1193,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence, ReadBuffersOperation operation; Buffer buffer; int flags; + char persistence; /* * Backward compatibility path, most code should use ExtendBufferedRel() @@ -1224,12 +1215,19 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence, return ExtendBufferedRel(BMR_REL(rel), forkNum, strategy, flags); } + if (rel) + persistence = rel->rd_rel->relpersistence; + else if (smgr_persistence == 0) + persistence = RELPERSISTENCE_PERMANENT; + else + persistence = smgr_persistence; + if (unlikely(mode == RBM_ZERO_AND_CLEANUP_LOCK || mode == RBM_ZERO_AND_LOCK)) { bool found; - buffer = PinBufferForBlock(rel, smgr, smgr_persistence, + buffer = PinBufferForBlock(rel, smgr, persistence, forkNum, blockNum, strategy, &found); ZeroAndLockBuffer(buffer, mode, found); return buffer; @@ -1241,7 +1239,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence, flags = 0; operation.smgr = smgr; operation.rel = rel; - operation.smgr_persistence = smgr_persistence; + operation.persistence = persistence; operation.forknum = forkNum; operation.strategy = strategy; if (StartReadBuffer(&operation, @@ -1272,7 +1270,7 @@ StartReadBuffersImpl(ReadBuffersOperation *operation, buffers[i] = PinBufferForBlock(operation->rel, operation->smgr, - operation->smgr_persistence, + operation->persistence, operation->forknum, blockNum + i, operation->strategy, -- 2.45.2
From 408a9487f64397739f23839754000b28883d8ab6 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz <byavu...@gmail.com> Date: Wed, 17 Jul 2024 11:30:51 +0300 Subject: [PATCH v3 2/3] Add a way to create read stream object by using SMgrRelation Currently read stream object can be created only by using Relation, there is no way to create it by using SMgrRelation. To achieve that, read_stream_begin_impl() function is created and contents of the read_stream_begin_relation() is moved into this function. Both read_stream_begin_relation() and read_stream_begin_smgr_relation() calls read_stream_begin_impl() by Relation and SMgrRelation respectively. --- src/include/storage/read_stream.h | 10 ++++ src/backend/storage/aio/read_stream.c | 83 ++++++++++++++++++++++----- 2 files changed, 79 insertions(+), 14 deletions(-) diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h index f676d2cc20a..4e599904f26 100644 --- a/src/include/storage/read_stream.h +++ b/src/include/storage/read_stream.h @@ -15,6 +15,7 @@ #define READ_STREAM_H #include "storage/bufmgr.h" +#include "storage/smgr.h" /* Default tuning, reasonable for many users. */ #define READ_STREAM_DEFAULT 0x00 @@ -57,6 +58,15 @@ extern ReadStream *read_stream_begin_relation(int flags, void *callback_private_data, size_t per_buffer_data_size); extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_data); +extern ReadStream *read_stream_begin_smgr_relation(int flags, + BufferAccessStrategy strategy, + SMgrRelation smgr, + char smgr_persistence, + ForkNumber forknum, + ReadStreamBlockNumberCB callback, + void *callback_private_data, + size_t per_buffer_data_size); +extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_private); extern void read_stream_reset(ReadStream *stream); extern void read_stream_end(ReadStream *stream); diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index 58221649f27..bd50c55788e 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -406,14 +406,16 @@ read_stream_look_ahead(ReadStream *stream, bool suppress_advice) * write extra data for each block into the space provided to it. It will * also receive callback_private_data for its own purposes. */ -ReadStream * -read_stream_begin_relation(int flags, - BufferAccessStrategy strategy, - Relation rel, - ForkNumber forknum, - ReadStreamBlockNumberCB callback, - void *callback_private_data, - size_t per_buffer_data_size) +static ReadStream * +read_stream_begin_impl(int flags, + BufferAccessStrategy strategy, + Relation rel, + SMgrRelation smgr, + char persistence, + ForkNumber forknum, + ReadStreamBlockNumberCB callback, + void *callback_private_data, + size_t per_buffer_data_size) { ReadStream *stream; size_t size; @@ -422,9 +424,6 @@ read_stream_begin_relation(int flags, int strategy_pin_limit; uint32 max_pinned_buffers; Oid tablespace_id; - SMgrRelation smgr; - - smgr = RelationGetSmgr(rel); /* * Decide how many I/Os we will allow to run at the same time. That @@ -434,7 +433,7 @@ read_stream_begin_relation(int flags, */ tablespace_id = smgr->smgr_rlocator.locator.spcOid; if (!OidIsValid(MyDatabaseId) || - IsCatalogRelation(rel) || + (rel && IsCatalogRelation(rel)) || IsCatalogRelationOid(smgr->smgr_rlocator.locator.relNumber)) { /* @@ -550,8 +549,8 @@ read_stream_begin_relation(int flags, for (int i = 0; i < max_ios; ++i) { stream->ios[i].op.rel = rel; - stream->ios[i].op.smgr = RelationGetSmgr(rel); - stream->ios[i].op.persistence = rel->rd_rel->relpersistence; + stream->ios[i].op.smgr = smgr; + stream->ios[i].op.persistence = persistence; stream->ios[i].op.forknum = forknum; stream->ios[i].op.strategy = strategy; } @@ -559,6 +558,62 @@ read_stream_begin_relation(int flags, return stream; } +/* + * Create a new read stream for reading a relation. + * See read_stream_begin_impl() for the detailed explanation. + */ +ReadStream * +read_stream_begin_relation(int flags, + BufferAccessStrategy strategy, + Relation rel, + ForkNumber forknum, + ReadStreamBlockNumberCB callback, + void *callback_private_data, + size_t per_buffer_data_size) +{ + return read_stream_begin_impl(flags, + strategy, + rel, + RelationGetSmgr(rel), + rel->rd_rel->relpersistence, + forknum, + callback, + callback_private_data, + per_buffer_data_size); +} + +/* + * Create a new read stream for reading a SMgr relation. + * See read_stream_begin_impl() for the detailed explanation. + */ +ReadStream * +read_stream_begin_smgr_relation(int flags, + BufferAccessStrategy strategy, + SMgrRelation smgr, + char smgr_persistence, + ForkNumber forknum, + ReadStreamBlockNumberCB callback, + void *callback_private_data, + size_t per_buffer_data_size) +{ + char persistence; + + if (smgr_persistence == 0) + persistence = RELPERSISTENCE_PERMANENT; + else + persistence = smgr_persistence; + + return read_stream_begin_impl(flags, + strategy, + NULL, + smgr, + persistence, + forknum, + callback, + callback_private_data, + per_buffer_data_size); +} + /* * Pull one pinned buffer out of a stream. Each call returns successive * blocks in the order specified by the callback. If per_buffer_data_size was -- 2.45.2
From 2aee4cc322bf47f7eb82ad48a472d4c9f2706f1d Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz <byavu...@gmail.com> Date: Tue, 16 Jul 2024 18:11:21 +0300 Subject: [PATCH v3 3/3] Use read streams in CREATE DATABASE when strategy is wal_log CREATE DABASE command uses RelationCopyStorageUsingBuffer() function to copy buffers when the strategy is wal_log. This function reads source buffers then copies them to the destination buffers. Read streams are used only only while reading source buffers because the destination buffer is read by 'RBM_ZERO_AND_LOCK' option, so it is not important. --- src/backend/storage/buffer/bufmgr.c | 53 ++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 4 deletions(-) diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 7cbcdc63a6f..2acc109c4db 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -54,6 +54,7 @@ #include "storage/ipc.h" #include "storage/lmgr.h" #include "storage/proc.h" +#include "storage/read_stream.h" #include "storage/smgr.h" #include "storage/standby.h" #include "utils/memdebug.h" @@ -135,6 +136,33 @@ typedef struct SMgrSortArray SMgrRelation srel; } SMgrSortArray; +/* + * Helper struct for read stream object used in + * RelationCopyStorageUsingBuffer() function. + */ +struct copy_storage_using_buffer_read_stream_private +{ + BlockNumber blocknum; + BlockNumber last_block; +}; + +/* + * Callback function to get next block for read stream object used in + * RelationCopyStorageUsingBuffer() function. + */ +static BlockNumber +copy_storage_using_buffer_read_stream_next_block(ReadStream *stream, + void *callback_private_data, + void *per_buffer_data) +{ + struct copy_storage_using_buffer_read_stream_private *p = callback_private_data; + + if (p->blocknum < p->last_block) + return p->blocknum++; + + return InvalidBlockNumber; +} + /* GUC variables */ bool zero_damaged_pages = false; int bgwriter_lru_maxpages = 100; @@ -4688,6 +4716,9 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator, PGIOAlignedBlock buf; BufferAccessStrategy bstrategy_src; BufferAccessStrategy bstrategy_dst; + struct copy_storage_using_buffer_read_stream_private p; + ReadStream *src_stream; + SMgrRelation src_smgr; /* * In general, we want to write WAL whenever wal_level > 'minimal', but we @@ -4716,19 +4747,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator, bstrategy_src = GetAccessStrategy(BAS_BULKREAD); bstrategy_dst = GetAccessStrategy(BAS_BULKWRITE); + /* Initalize streaming read */ + p.blocknum = 0; + p.last_block = nblocks; + src_smgr = smgropen(srclocator, INVALID_PROC_NUMBER); + src_stream = read_stream_begin_smgr_relation(READ_STREAM_FULL, + bstrategy_src, + src_smgr, + permanent ? RELPERSISTENCE_PERMANENT : RELPERSISTENCE_UNLOGGED, + forkNum, + copy_storage_using_buffer_read_stream_next_block, + &p, + 0); + /* Iterate over each block of the source relation file. */ for (blkno = 0; blkno < nblocks; blkno++) { CHECK_FOR_INTERRUPTS(); /* Read block from source relation. */ - srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno, - RBM_NORMAL, bstrategy_src, - permanent); + srcBuf = read_stream_next_buffer(src_stream, NULL); LockBuffer(srcBuf, BUFFER_LOCK_SHARE); srcPage = BufferGetPage(srcBuf); - dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, blkno, + dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, + BufferGetBlockNumber(srcBuf), RBM_ZERO_AND_LOCK, bstrategy_dst, permanent); dstPage = BufferGetPage(dstBuf); @@ -4748,6 +4791,8 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator, UnlockReleaseBuffer(dstBuf); UnlockReleaseBuffer(srcBuf); } + Assert(read_stream_next_buffer(src_stream, NULL) == InvalidBuffer); + read_stream_end(src_stream); FreeAccessStrategy(bstrategy_src); FreeAccessStrategy(bstrategy_dst); -- 2.45.2