On Tue, Apr 16, 2024 at 4:10 AM Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Apr 10, 2024 at 9:55 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > > To rescue my initdb --rel-segsize project[1] for v18, I will have a go > > at making that dynamic. It looks like we don't actually need to > > allocate that until we get to the GetFileBackupMethod() call, and at > > that point we have the file size. If I understand correctly, > > statbuf.st_size / BLCKSZ would be enough, so we could embiggen our > > block number buffer there if required, right? > > Yes.
Here is a first attempt at that. Better factoring welcome. New observations made along the way: the current coding can exceed MaxAllocSize and error out, or overflow 32 bit size_t and allocate nonsense. Do you think it'd be better to raise an error explaining that, or silently fall back to full backup (tried that way in the attached), or that + log messages? Another option would be to use huge allocations, so we only have to deal with that sort of question for 32 bit systems (i.e. effectively hypothetical/non-relevant scenarios), but I don't love that idea. > ... > I do understand that a 1GB segment size is not that big in 2024, and > that filesystems with a 2GB limit are thought to have died out a long > time ago, and I'm not against using larger segments. I do think, > though, that increasing the segment size by 32768x in one shot is > likely to be overdoing it. My intuition is that the primary interesting lines to cross are at 2GB and 4GB due to data type stuff. I defend against the most basic problem in my proposal: I don't let you exceed your off_t type, but that doesn't mean we don't have off_t/ssize_t/size_t/long snafus lurking in our code that could bite a 32 bit system with large files. If you make it past those magic numbers and your tools are all happy, I think you should be home free until you hit file system limits, which are effectively unhittable on most systems except ext4's already bemoaned 16TB limit AFAIK. But all the same, I'm contemplating limiting the range to 1TB in the first version, not because of general fear of unknown unknowns, but just because it means we don't need to use "huge" allocations for this known place, maybe until we can address that.
From 30efb6d39c83e9d4f338e10e1b8944c64f8799c5 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Wed, 15 May 2024 17:23:21 +1200 Subject: [PATCH v1] Limit block number buffer size in incremental backup. Previously, basebackup.c would allocate an array big enough to hold a block number for every block in a full sized md.c segment file. That works out to 512kB by default, which should be no problem. However, users can change the segment size at compile time, and the required space for the array could be many gigabytes, posing problems: 1. For segment sizes over 2TB, MaxAllocSize would be exceeded, raising an error. 2. For segment sizes over 8TB, size_t arithmetic would overflow on 32 bit systems, leading to the wrong size allocation. 3. For any very large segment size, it's non-ideal to allocate a huge amount of memory if you're not actually going to use it. This isn't a fundamental fix for the high memory requirement of the algorithm as coded, but it seems like a good idea to avoid those limits with a fallback strategy, and defer allocating until we see how big the segments actually are in the cluster being backed up, upsizing as required. These are mostly theoretical problems at the moment, because it is so hard to change the segment size in practice that people don't do it. A new proposal would make it changeable at run time, hence interest in tidying these rough edges up. Discussion: https://postgr.es/m/CA%2BhUKG%2B2hZ0sBztPW4mkLfng0qfkNtAHFUfxOMLizJ0BPmi5%2Bg%40mail.gmail.com diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 9a2bf59e84e..0703b77af94 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -34,6 +34,7 @@ #include "pgstat.h" #include "pgtar.h" #include "port.h" +#include "port/pg_bitutils.h" #include "postmaster/syslogger.h" #include "postmaster/walsummarizer.h" #include "replication/walsender.h" @@ -45,6 +46,7 @@ #include "storage/reinit.h" #include "utils/builtins.h" #include "utils/guc.h" +#include "utils/memutils.h" #include "utils/ps_status.h" #include "utils/relcache.h" #include "utils/resowner.h" @@ -1198,13 +1200,7 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly, bool isGlobalDir = false; Oid dboid = InvalidOid; BlockNumber *relative_block_numbers = NULL; - - /* - * Since this array is relatively large, avoid putting it on the stack. - * But we don't need it at all if this is not an incremental backup. - */ - if (ib != NULL) - relative_block_numbers = palloc(sizeof(BlockNumber) * RELSEG_SIZE); + size_t relative_block_numbers_size = 0; /* * Determine if the current path is a database directory that can contain @@ -1477,6 +1473,7 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly, unsigned truncation_block_length = 0; char tarfilenamebuf[MAXPGPATH * 2]; char *tarfilename = pathbuf + basepathlen + 1; + size_t stat_blocks; FileBackupMethod method = BACK_UP_FILE_FULLY; if (ib != NULL && isRelationFile) @@ -1499,6 +1496,48 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly, lookup_path = pstrdup(tarfilename); } + /* If we have a block number array but it's too small, free. */ + stat_blocks = statbuf.st_size / BLCKSZ; + if (relative_block_numbers && + relative_block_numbers_size < stat_blocks) + { + pfree(relative_block_numbers); + relative_block_numbers = NULL; + } + + /* + * Make a new block number buffer if we need one, being + * careful to avoid allocating memory for unexpectedly + * oversized files. Also avoid exceeding the artifical + * allocation size cap MaxAllocSize. + * + * We could use huge allocations to get around that, and if we + * consider changing that, we should remember to clamp instead + * for SIZE_MAX, for the sake of 32 bit systems. + * + * If we decline to allocate an array here, + * GetFileBackupMethod will force a full backup. + */ + if (!relative_block_numbers && + stat_blocks > 0 && + stat_blocks <= RELSEG_SIZE && + stat_blocks * sizeof(BlockNumber) <= MaxAllocSize) + { + uint64 size; + + /* + * Round up to next power of two, to amortize reallocation + * overheads. Clamp so we don't exceed the maximum + * potential useful size, or the allocation limit. + */ + size = pg_nextpower2_64(stat_blocks); + size = Min(RELSEG_SIZE, size); + size = Min(MaxAllocSize / sizeof(BlockNumber), size); + + relative_block_numbers = palloc(sizeof(BlockNumber) * size); + relative_block_numbers_size = size; + } + method = GetFileBackupMethod(ib, lookup_path, dboid, relspcoid, relfilenumber, relForkNum, segno, statbuf.st_size, diff --git a/src/backend/backup/basebackup_incremental.c b/src/backend/backup/basebackup_incremental.c index 4ad9d7cad00..fe776e2b2de 100644 --- a/src/backend/backup/basebackup_incremental.c +++ b/src/backend/backup/basebackup_incremental.c @@ -736,7 +736,7 @@ GetIncrementalFilePath(Oid dboid, Oid spcoid, RelFileNumber relfilenumber, * an incremental file in the backup instead of the entire file. On return, * *num_blocks_required will be set to the number of blocks that need to be * sent, and the actual block numbers will have been stored in - * relative_block_numbers, which should be an array of at least RELSEG_SIZE. + * relative_block_numbers, which should be an array of at least size / BLCKSZ. * In addition, *truncation_block_length will be set to the value that should * be included in the incremental file. */ @@ -768,10 +768,12 @@ GetFileBackupMethod(IncrementalBackupInfo *ib, const char *path, Assert(RelFileNumberIsValid(relfilenumber)); /* - * If the file size is too large or not a multiple of BLCKSZ, then + * If the file size is too large or not a multiple of BLCKSZ, or the + * caller was unable to allocate a suitably sized block number array, then * something weird is happening, so give up and send the whole file. */ - if ((size % BLCKSZ) != 0 || size / BLCKSZ > RELSEG_SIZE) + if ((size % BLCKSZ) != 0 || size / BLCKSZ > RELSEG_SIZE || + !relative_block_numbers) return BACK_UP_FILE_FULLY; /* -- 2.39.2