On Thu, Dec 12, 2024 at 1:18 AM Ryohei Takahashi (Fujitsu) <r.takahash...@fujitsu.com> wrote: > The performance of PG16.6 and PG17.0 are worse than PG16.4. > So, I think the commits between August and September affects the performance. > I will analyze these commits.
If it reproduces reliably, maybe git bisect? Do you have a profiler? Can you show the system call trace for good and bad behaviour? But I wonder if there might just be some weird code placement variation causing arbitrary performance changes, because nothing is jumping out of that version range when I look at it... How do other versions, .0, .1, .2, .3 perform? What about 15.x? Just by the way, in case you are interested in the broader topic of bulk file extension, here are some ideas that might be worth trying out on a serious Windows server (maybe later once the unexpected regression is understood): 1. Those code paths finish up in pg_pwritev(), but it has a loop over 8kb writes on Windows. Does it help if we just make "zbuffer" bigger? How big? 2. While pondering the goals of posix_fallocate(), I had a realisation about how we might implement FileFallocate() on Windows. Does this idea work? Well? Experiment-grade patches attached.
From cc5c91f1e16ad1335cb2efda67576fa419476d2a Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Thu, 12 Dec 2024 15:51:21 +1300 Subject: [PATCH] Use bigger writes in pg_pwrite_zeros() on Windows. XXX Is this helpful? --- src/common/file_utils.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/src/common/file_utils.c b/src/common/file_utils.c index 398fe1c334d..0de8b5ffb31 100644 --- a/src/common/file_utils.c +++ b/src/common/file_utils.c @@ -687,8 +687,16 @@ pg_pwritev_with_retry(int fd, const struct iovec *iov, int iovcnt, off_t offset) ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset) { - static const PGIOAlignedBlock zbuffer = {{0}}; /* worth BLCKSZ */ - void *zerobuf_addr = unconstify(PGIOAlignedBlock *, &zbuffer)->data; + /* + * On Windows, pg_pwritev() isn't a system call, it's a loop. It might be + * worth wasting more memory on zero buffers to get fewer loops. + */ +#ifdef WIN32 + static const PGIOAlignedBlock zbuffer[8] = {{{0}}}; +#else + static const PGIOAlignedBlock zbuffer[1] = {{{0}}}; +#endif + void *zerobuf_addr = unconstify(PGIOAlignedBlock *, &zbuffer[0])->data; struct iovec iov[PG_IOV_MAX]; size_t remaining_size = size; ssize_t total_written = 0; @@ -703,13 +711,8 @@ pg_pwrite_zeros(int fd, size_t size, off_t offset) { size_t this_iov_size; + this_iov_size = Min(remaining_size, sizeof(zbuffer)); iov[iovcnt].iov_base = zerobuf_addr; - - if (remaining_size < BLCKSZ) - this_iov_size = remaining_size; - else - this_iov_size = BLCKSZ; - iov[iovcnt].iov_len = this_iov_size; remaining_size -= this_iov_size; } -- 2.39.5
From aa750fd3ba0eb4ee652bdb13a4564c33ede44e04 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Thu, 12 Dec 2024 17:32:31 +1300 Subject: [PATCH] Implement FileFallocate() for Windows. XXX Does this work, and is it beneficial? XXX Would the slight non-atomicity break any user in PostgreSQL? I doubt it... --- src/backend/storage/file/fd.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 7c403fb360e..9b2a4a13f39 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -2390,7 +2390,7 @@ FileZero(File file, off_t offset, off_t amount, uint32 wait_event_info) int FileFallocate(File file, off_t offset, off_t amount, uint32 wait_event_info) { -#ifdef HAVE_POSIX_FALLOCATE +#if defined(HAVE_POSIX_FALLOCATE) || defined(WIN32) int returnCode; Assert(FileIsValid(file)); @@ -2405,7 +2405,31 @@ FileFallocate(File file, off_t offset, off_t amount, uint32 wait_event_info) retry: pgstat_report_wait_start(wait_event_info); +#ifdef WIN32 + { + off_t old_size; + off_t new_size; + + /* + * On Windows, files are not sparse by default, so ftruncate() can + * allocate new disk blocks without writing through the page cache. + */ + old_size = lseek(VfdCache[file].fd, 0, SEEK_END); + if (old_size < 0) + return -1; + new_size = offset + amount; + if (new_size > old_size) + if (ftruncate(VfdCache[file].fd, new_size) < 0) + return -1; + } +#else + + /* + * On Unix, files are usually sparse by default, so posix_fallocate() is + * needed to allocate disk blocks without writing through the page cache. + */ returnCode = posix_fallocate(VfdCache[file].fd, offset, amount); +#endif pgstat_report_wait_end(); if (returnCode == 0) -- 2.39.5