Andres Freund <and...@anarazel.de> writes: > On 2019-08-16 17:31:49 -0400, Tom Lane wrote: >> I fear that allowing pg_do_encoding_conversion to return strings longer >> than 1GB is just going to create failure cases somewhere else. >> >> However, it's certainly true that 4x growth is a pretty unlikely worst >> case. Maybe we could do something like >> 1. If string is short (say up to a few megabytes), continue to do it >> like now. This avoids adding overhead for typical cases. >> 2. Otherwise, run some lobotomized form of encoding conversion that >> just computes the space required (as an int64, I guess) without saving >> the result anywhere. >> 3. If space required > 1GB, fail. >> 4. Otherwise, allocate just the space required, and convert.
> It's probably too big a hammer for this specific case, but I think at > some point we ought to stop using fixed size allocations for this kind > of work. Instead we should use something roughly like our StringInfo, > except that when exceeding the current size limit, the overflowing data > is stored in a separate allocation. And only once we actually need the > data in a consecutive form, we allocate memory that's large enough to > store the all the separate allocations in their entirety. That sounds pretty messy :-(. I spent some time looking at what I proposed above, and concluded that it's probably impractical. In the first place, we'd have to change the API spec for encoding conversion functions. Now maybe that would not be a huge deal, because there likely aren't very many people outside the core code who are defining their own conversion functions, but it's still a negative. More importantly, unless we wanted to duplicate large swaths of code, we'd end up inserting changes about like this into the inner loops of encoding conversions: - *dest++ = code; + if (dest) + *dest++ = code; + outcount++; which seems like it'd be bad for performance. So I now think that Alvaro's got basically the right idea, except that I'm still afraid to allow strings larger than MaxAllocSize to run around loose in the backend. So in addition to the change he suggested, we need a final check on strlen(result) not being too large. We can avoid doing a useless strlen() if the input len is small, though. It then occurred to me that we could also repalloc the output buffer down to just the required size, which is pointless if it's small but not if we can give back several hundred MB. This is conveniently mergeable with the check to see whether we need to check strlen or not. ... or at least, that's what I thought we could do. Testing showed me that AllocSetRealloc never actually gives back any space, even when it's just acting as a frontend for a direct malloc. However, we can fix that for little more than the price of swapping the order of the is-it-a-decrease and is-it-a-large-chunk stanzas, as in the 0002 patch below. I also put back the missing overflow check --- although that's unreachable in a 64-bit machine, it's not at all in 32-bit. The patch is still useful in 32-bit though, since it still doubles the size of string we can cope with. I think this is committable, though surely another pair of eyeballs on it wouldn't hurt. Also, is it worth having a different error message for the case where the output does exceed MaxAllocSize? regards, tom lane
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c index bec54bb..6b08b77 100644 --- a/src/backend/utils/mb/mbutils.c +++ b/src/backend/utils/mb/mbutils.c @@ -349,16 +349,24 @@ pg_do_encoding_conversion(unsigned char *src, int len, pg_encoding_to_char(dest_encoding)))); /* - * Allocate space for conversion result, being wary of integer overflow + * Allocate space for conversion result, being wary of integer overflow. + * + * len * MAX_CONVERSION_GROWTH is typically a vast overestimate of the + * required space, so it might exceed MaxAllocSize even though the result + * would actually fit. We do not want to hand back a result string that + * exceeds MaxAllocSize, because callers might not cope gracefully --- but + * if we just allocate more than that, and don't use it, that's fine. */ - if ((Size) len >= (MaxAllocSize / (Size) MAX_CONVERSION_GROWTH)) + if ((Size) len >= (MaxAllocHugeSize / (Size) MAX_CONVERSION_GROWTH)) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("out of memory"), errdetail("String of %d bytes is too long for encoding conversion.", len))); - result = palloc(len * MAX_CONVERSION_GROWTH + 1); + result = (unsigned char *) + MemoryContextAllocHuge(CurrentMemoryContext, + (Size) len * MAX_CONVERSION_GROWTH + 1); OidFunctionCall5(proc, Int32GetDatum(src_encoding), @@ -366,6 +374,26 @@ pg_do_encoding_conversion(unsigned char *src, int len, CStringGetDatum(src), CStringGetDatum(result), Int32GetDatum(len)); + + /* + * If the result is large, it's worth repalloc'ing to release any extra + * space we asked for. The cutoff here is somewhat arbitrary, but we + * *must* check when len * MAX_CONVERSION_GROWTH exceeds MaxAllocSize. + */ + if (len > 1000000) + { + Size resultlen = strlen((char *) result); + + if (resultlen >= MaxAllocSize) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("out of memory"), + errdetail("String of %d bytes is too long for encoding conversion.", + len))); + + result = (unsigned char *) repalloc(result, resultlen + 1); + } + return result; } @@ -682,16 +710,19 @@ perform_default_encoding_conversion(const char *src, int len, return unconstify(char *, src); /* - * Allocate space for conversion result, being wary of integer overflow + * Allocate space for conversion result, being wary of integer overflow. + * See comments in pg_do_encoding_conversion. */ - if ((Size) len >= (MaxAllocSize / (Size) MAX_CONVERSION_GROWTH)) + if ((Size) len >= (MaxAllocHugeSize / (Size) MAX_CONVERSION_GROWTH)) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("out of memory"), errdetail("String of %d bytes is too long for encoding conversion.", len))); - result = palloc(len * MAX_CONVERSION_GROWTH + 1); + result = (char *) + MemoryContextAllocHuge(CurrentMemoryContext, + (Size) len * MAX_CONVERSION_GROWTH + 1); FunctionCall5(flinfo, Int32GetDatum(src_encoding), @@ -699,6 +730,25 @@ perform_default_encoding_conversion(const char *src, int len, CStringGetDatum(src), CStringGetDatum(result), Int32GetDatum(len)); + + /* + * Release extra space if there might be a lot --- see comments in + * pg_do_encoding_conversion. + */ + if (len > 1000000) + { + Size resultlen = strlen(result); + + if (resultlen >= MaxAllocSize) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("out of memory"), + errdetail("String of %d bytes is too long for encoding conversion.", + len))); + + result = (char *) repalloc(result, resultlen + 1); + } + return result; }
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 6b63d6f..5e964ea 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -1084,62 +1084,12 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size) set->header.name, chunk); #endif - /* - * Chunk sizes are aligned to power of 2 in AllocSetAlloc(). Maybe the - * allocated area already is >= the new size. (In particular, we always - * fall out here if the requested size is a decrease.) - */ - if (oldsize >= size) - { -#ifdef MEMORY_CONTEXT_CHECKING - Size oldrequest = chunk->requested_size; - -#ifdef RANDOMIZE_ALLOCATED_MEMORY - /* We can only fill the extra space if we know the prior request */ - if (size > oldrequest) - randomize_mem((char *) pointer + oldrequest, - size - oldrequest); -#endif - - chunk->requested_size = size; - - /* - * If this is an increase, mark any newly-available part UNDEFINED. - * Otherwise, mark the obsolete part NOACCESS. - */ - if (size > oldrequest) - VALGRIND_MAKE_MEM_UNDEFINED((char *) pointer + oldrequest, - size - oldrequest); - else - VALGRIND_MAKE_MEM_NOACCESS((char *) pointer + size, - oldsize - size); - - /* set mark to catch clobber of "unused" space */ - if (size < oldsize) - set_sentinel(pointer, size); -#else /* !MEMORY_CONTEXT_CHECKING */ - - /* - * We don't have the information to determine whether we're growing - * the old request or shrinking it, so we conservatively mark the - * entire new allocation DEFINED. - */ - VALGRIND_MAKE_MEM_NOACCESS(pointer, oldsize); - VALGRIND_MAKE_MEM_DEFINED(pointer, size); -#endif - - /* Disallow external access to private part of chunk header. */ - VALGRIND_MAKE_MEM_NOACCESS(chunk, ALLOCCHUNK_PRIVATE_LEN); - - return pointer; - } - if (oldsize > set->allocChunkLimit) { /* * The chunk must have been allocated as a single-chunk block. Use - * realloc() to make the containing block bigger with minimum space - * wastage. + * realloc() to make the containing block bigger, or smaller, with + * minimum space wastage. */ AllocBlock block = (AllocBlock) (((char *) chunk) - ALLOC_BLOCKHDRSZ); Size chksize; @@ -1153,11 +1103,19 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size) if (block->aset != set || block->freeptr != block->endptr || block->freeptr != ((char *) block) + - (chunk->size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ)) + (oldsize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ)) elog(ERROR, "could not find block containing chunk %p", chunk); + /* + * Even if the new request is less than set->allocChunkLimit, we stick + * with the single-chunk block approach. Therefore we need + * chunk->size to be bigger than set->allocChunkLimit, so we don't get + * confused about the chunk's status. + */ + chksize = Max(size, set->allocChunkLimit + 1); + chksize = MAXALIGN(chksize); + /* Do the realloc */ - chksize = MAXALIGN(size); blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ; block = (AllocBlock) realloc(block, blksize); if (block == NULL) @@ -1182,17 +1140,19 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size) #ifdef MEMORY_CONTEXT_CHECKING #ifdef RANDOMIZE_ALLOCATED_MEMORY /* We can only fill the extra space if we know the prior request */ - randomize_mem((char *) pointer + chunk->requested_size, - size - chunk->requested_size); + if (size > chunk->requested_size) + randomize_mem((char *) pointer + chunk->requested_size, + size - chunk->requested_size); #endif /* - * realloc() (or randomize_mem()) will have left the newly-allocated + * realloc() (or randomize_mem()) will have left any newly-allocated * part UNDEFINED, but we may need to adjust trailing bytes from the * old allocation. */ - VALGRIND_MAKE_MEM_UNDEFINED((char *) pointer + chunk->requested_size, - oldsize - chunk->requested_size); + if (oldsize > chunk->requested_size) + VALGRIND_MAKE_MEM_UNDEFINED((char *) pointer + chunk->requested_size, + oldsize - chunk->requested_size); chunk->requested_size = size; @@ -1217,6 +1177,56 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size) return pointer; } + + /* + * Chunk sizes are aligned to power of 2 in AllocSetAlloc(). Maybe the + * allocated area already is >= the new size. (In particular, we will + * fall out here if the requested size is a decrease.) + */ + if (oldsize >= size) + { +#ifdef MEMORY_CONTEXT_CHECKING + Size oldrequest = chunk->requested_size; + +#ifdef RANDOMIZE_ALLOCATED_MEMORY + /* We can only fill the extra space if we know the prior request */ + if (size > oldrequest) + randomize_mem((char *) pointer + oldrequest, + size - oldrequest); +#endif + + chunk->requested_size = size; + + /* + * If this is an increase, mark any newly-available part UNDEFINED. + * Otherwise, mark the obsolete part NOACCESS. + */ + if (size > oldrequest) + VALGRIND_MAKE_MEM_UNDEFINED((char *) pointer + oldrequest, + size - oldrequest); + else + VALGRIND_MAKE_MEM_NOACCESS((char *) pointer + size, + oldsize - size); + + /* set mark to catch clobber of "unused" space */ + if (size < oldsize) + set_sentinel(pointer, size); +#else /* !MEMORY_CONTEXT_CHECKING */ + + /* + * We don't have the information to determine whether we're growing + * the old request or shrinking it, so we conservatively mark the + * entire new allocation DEFINED. + */ + VALGRIND_MAKE_MEM_NOACCESS(pointer, oldsize); + VALGRIND_MAKE_MEM_DEFINED(pointer, size); +#endif + + /* Disallow external access to private part of chunk header. */ + VALGRIND_MAKE_MEM_NOACCESS(chunk, ALLOCCHUNK_PRIVATE_LEN); + + return pointer; + } else { /*