Hi, Tomas queried[1] the limit of 256kB (or really 32 blocks) for io_combine_limit. Yeah, I think we should increase it and allow experimentation with larger numbers. Note that real hardware and protocols have segment and size limits that can force the kernel to split your I/Os, so it's not at all a given that it'll help much or at all to use very large sizes, but YMMV. I was originally cautious because I didn't want to make a few stack buffers too big, but arrays of BlockNumber, struct iovec, and pointer don't seem too excessive at say 128 (cf whole blocks on the stack, a thing we do, which would still be many times larger that the relevant arrays). I was also anticipating future code that would need to multiply that number by other terms to allocate shared memory, but after some off-list discussion, that seems OK: such code should be able to deal with that using GUCs instead of maximally pessimal allocation. 128 gives a nice round number of 1M as a maximum transfer size, and comparable systems seem to have upper limits around that mark. Patch attached.
[1] https://www.postgresql.org/message-id/flat/85696b8e-f1bf-459e-ba97-5608c644c185%40vondra.me#a4c98be8b55095ce14897dab4793c255
From e8bb23b2130023ec497855c239b0299421b6833d Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Tue, 21 Jan 2025 11:44:35 +1300 Subject: [PATCH] Increase io_combine_limit maximum to 1MB. The default value of 128kB is not changed, but the fairly arbitrary upper limit is changed from 256kB to 1MB (assuming default block size, and assuming the operating system doesn't limit us to a smaller size). Whether your system can really do 1MB physical transfers without splitting them up is another matter, but at least having the option to try is useful. Other RDBMSes seem to have limits around that number. The concrete change is to our definition of PG_IOV_MAX, which in turn controls MAX_IO_COMBINE_LIMIT. The latter limits the GUC used for buffer pool I/O, but the former also affects a couple of other places that work with arrays of struct iovec or smaller objects on the stack and always use the compile-time maximum. Therefore we still don't want to use IOV_MAX directly, which is not under our control and likely to be an unnecessarily high 1024. 128 blocks seems acceptable though. The only current Unix on our target list that is known to have a very low IOV_MAX is older Solaris, though even it changed recently. Solaris 11.4 SRU69 added preadv/pwritev, 11.4 SRU72 bumped IOV_MAX from 16 to 1024. Our Solaris build farm animals don't have either of those updates yet so are currently testing the fallback code (like Windows). Note that its open source cousin illumos already made equivalent changes years ago. For Windows, we can't use real scatter/gather yet (though it's possible, in later work), so we continue to provide an IOV_MAX value of 16 and emulate preadv()/pwritev() with loops. It's debatable whether we should increase that number too: it still benefits from I/O combining sometimes when buffers happen to be consecutive in memory. Someone would need to research that. This change also makes it theoretically possible for read_stream.c's internal cap of INT16_MAX to be hit. With effective_io_concurrency set to 1000 and io_combine_limit set to 1MB, you could theoretically want to pin 128K buffers at once (= 1GB of data), but there is a cap at ~32K buffers that stems from the choice of data type. That could be revisited in future, but in practice several other limits are very likely to kick in first. Suggested-by: Tomas Vondra <to...@vondra.me> --- doc/src/sgml/config.sgml | 2 ++ src/backend/storage/aio/read_stream.c | 9 ++++++--- src/backend/utils/misc/postgresql.conf.sample | 2 +- src/include/port/pg_iovec.h | 8 ++++++-- 4 files changed, 15 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 38244409e3c..5f7d1e41951 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2614,6 +2614,8 @@ include_dir 'conf.d' <listitem> <para> Controls the largest I/O size in operations that combine I/O. + The maximum allowed size depends on the operating system and block + size, but is typically 1MB on Unix and 128kB on Windows. The default is 128kB. </para> </listitem> diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index e4414b2e915..b5a8f3ed941 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -438,9 +438,12 @@ read_stream_begin_impl(int flags, * Choose the maximum number of buffers we're prepared to pin. We try to * pin fewer if we can, though. We clamp it to at least io_combine_limit * so that we can have a chance to build up a full io_combine_limit sized - * read, even when max_ios is zero. Be careful not to allow int16 to - * overflow (even though that's not possible with the current GUC range - * limits), allowing also for the spare entry and the overflow space. + * read, even when max_ios is zero. + * + * Be careful not to allow int16 to overflow. That is possible with the + * current GUC range limits, so this is an artificial limit of ~32k + * buffers and we'd need to adjust the types to exceed that. We also have + * to allow for the spare entry and the overflow space. */ max_pinned_buffers = Max(max_ios * 4, io_combine_limit); max_pinned_buffers = Min(max_pinned_buffers, diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index c40b7a3121e..bd2fca53612 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -197,7 +197,7 @@ #backend_flush_after = 0 # measured in pages, 0 disables #effective_io_concurrency = 1 # 1-1000; 0 disables prefetching #maintenance_io_concurrency = 10 # 1-1000; 0 disables prefetching -#io_combine_limit = 128kB # usually 1-32 blocks (depends on OS) +#io_combine_limit = 128kB # usually 1-128 blocks (depends on OS) #max_worker_processes = 8 # (change requires restart) #max_parallel_workers_per_gather = 2 # limited by max_parallel_workers #max_parallel_maintenance_workers = 2 # limited by max_parallel_workers diff --git a/src/include/port/pg_iovec.h b/src/include/port/pg_iovec.h index d9891d3805d..426ef978301 100644 --- a/src/include/port/pg_iovec.h +++ b/src/include/port/pg_iovec.h @@ -33,8 +33,12 @@ struct iovec #endif -/* Define a reasonable maximum that is safe to use on the stack. */ -#define PG_IOV_MAX Min(IOV_MAX, 32) +/* + * Define a reasonable maximum that is safe to use on the stack in arrays of + * struct iovec and other small types. The operating system may limit us to a + * lower value, though most systems have 1024. + */ +#define PG_IOV_MAX Min(IOV_MAX, 128) /* * Like preadv(), but with a prefix to remind us of a side-effect: on Windows -- 2.39.5