On Thu, Nov 14, 2024 at 12:33:20PM +0000, Bertrand Drouvot wrote: > Case 2 should be read as "in the 4-31" bytes range on 32-bit system as all > comparisons are done in size_t.
I'd suggest to use a -m32 in your gcc switches, when it comes to tests, but you already know that.. Anyway, as you say, the portability of v12 is OK even for sizeof(size_t) == 4 because we don't rely on any hardcoded values, and this patch does what it should in this case (double-checked myself manually for the three cases with -m32). > What would be unsafe on 32-bit would be to read up to 32 bytes while len < 32 > and that can not happen. > > As mentioned up-thread the comments are wrong on 32-bit, indeed they must be > read > as: > > Case 1: len < 4 bytes > Case 2: len in the 4-31 bytes range > Case 3: len >= 32 bytes This part could be indeed better than what's proposed in v12, so I would recommend to use sizeof(size_t) a bit more consistently rather than have the reader guess that. Note that some parts of v12-0001 use sizeof(size_t) in the comments, which makes things inconsistent. Some comments feel duplicated, as well, like the "no risk" mentions, which are clear enough based on the description and the limitations of the previous cases. I'd like to suggest a few tweaks, making the comments more flexible. See 0003 that applies on top of your latest patch set, reattaching v12 again. -- Michael
From 59bbfd7ece0deb785f73a94cd53d330d86b58875 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 8 Nov 2024 14:19:59 +0900 Subject: [PATCH v13 1/3] Optimize pg_memory_is_all_zeros() pg_memory_is_all_zeros() is currently doing byte per byte comparison and so could lead to performance regression or penalties when multi bytes comparison could be done instead. Let's provide an optimized version that divides the checks into multiple cases for safety reason and multiple phases for efficiency: Case 1: len < 8 bytes, then byte per byte comparison. Case 2: len in the 8-63 bytes range: - Phase 1: Byte by byte comparison, until the pointer is aligned. - Phase 2: size_t comparisons, with aligned pointers, up to the last location possible. - Phase 3: Byte by byte comparison, until the end location. Case 3: len >= 64 bytes, same as case 2 except that an additional phase is is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers to use SIMD instructions if available, up to the last aligned location possible. Case 1 and Case 2 are mandatory to ensure that we won't read beyond the memory area. Code mainly suggested by David Rowley. --- src/include/utils/memutils.h | 123 ++++++++++++++++++++++++++++++++++- 1 file changed, 120 insertions(+), 3 deletions(-) diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 3590c8bad9..0808e91b77 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -190,19 +190,136 @@ extern MemoryContext BumpContextCreate(MemoryContext parent, #define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024) /* + * pg_memory_is_all_zeros + * * Test if a memory region starting at "ptr" and of size "len" is full of * zeroes. + * + * The test is divided into multiple cases for safety reason and multiple phases + * for efficiency. + * + * Case 1: len < 8 bytes, then byte per byte comparison. + * Case 2: len in the 8-63 bytes range: + * - Phase 1: Byte by byte comparison, until the pointer is aligned. + * - Phase 2: size_t comparisons, with aligned pointers, up to the last + * location possible. + * - Phase 3: Byte by byte comparison, until the end location. + * Case 3: len >= 64 bytes, same as case 2 except that an additional phase is + * is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t) + * comparisons using bitwise OR, to encourage compilers to use SIMD + * instructions if available, up to the last aligned location possible. + * + * Case 1 and Case 2 are mandatory to ensure that we won't read beyond the + * memory area. + * + * Caller must ensure that "ptr" is not NULL. */ static inline bool pg_memory_is_all_zeros(const void *ptr, size_t len) { - const char *p = (const char *) ptr; + const unsigned char *p = (const unsigned char *) ptr; + const unsigned char *end = &p[len]; + const unsigned char *aligned_end = (const unsigned char *) + ((uintptr_t) end & (~(sizeof(size_t) - 1))); - for (size_t i = 0; i < len; i++) + /* len < 8 bytes case */ + if (len < sizeof(size_t)) { - if (p[i] != 0) + while (p < end) + { + if (*p++ != 0) + return false; + } + return true; + } + + /* len in the 8-63 bytes range case */ + if (len < sizeof(size_t) * 8) + { + /* Compare bytes until the pointer "p" is aligned */ + while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0) + { + if (p == end) + return true; + if (*p++ != 0) + return false; + } + + /* + * Compare remaining size_t-aligned chunks. + * + * There is no risk to read beyond the memory area, as aligned_end + * can't be > end as we are in the len 8-63 bytes range case (so len + * >= 8). + */ + for (; p < aligned_end; p += sizeof(size_t)) + { + if (*(size_t *) p != 0) + return false; + } + + /* Compare remaining bytes until the end */ + while (p < end) + { + if (*p++ != 0) + return false; + } + return true; + } + + /* len >= 64 bytes case */ + + /* Compare bytes until the pointer "p" is aligned */ + while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0) + { + if (p == end) + return true; + + if (*p++ != 0) return false; } + + /* + * Compare 8 * sizeof(size_t) chunks at once. + * + * For performance reasons, we manually unroll this loop and purposefully + * use bitwise-ORs to combine each comparison. This prevents boolean + * short-circuiting and lets the compiler know that it's safe to access + * all 8 elements regardless of the result of the other comparisons. This + * seems to be enough to coax a few compilers into using SIMD + * instructions. + * + * There is no risk to read beyond the memory area as we are in the len >= + * 64 bytes case. + */ + for (; p < aligned_end - (sizeof(size_t) * 7); p += sizeof(size_t) * 8) + { + if ((((size_t *) p)[0] != 0) | (((size_t *) p)[1] != 0) | + (((size_t *) p)[2] != 0) | (((size_t *) p)[3] != 0) | + (((size_t *) p)[4] != 0) | (((size_t *) p)[5] != 0) | + (((size_t *) p)[6] != 0) | (((size_t *) p)[7] != 0)) + return false; + } + + /* + * Compare remaining size_t-aligned chunks. + * + * There is no risk to read beyond the memory area, as aligned_end can't + * be > end as we are in the len >= 64 bytes case (so len >= 8). + */ + for (; p < aligned_end; p += sizeof(size_t)) + { + if (*(size_t *) p != 0) + return false; + } + + /* Compare remaining bytes until the end */ + while (p < end) + { + if (*p++ != 0) + return false; + } + return true; } -- 2.45.2
From e5aaa29377f1d11d55ad781fdd17617a6e14f7df Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Fri, 8 Nov 2024 17:19:43 +0000 Subject: [PATCH v13 2/3] Make use of pg_memory_is_all_zeros() in PageIsVerifiedExtended() Now that pg_memory_is_all_zeros() is optimized to handle multi bytes comparison (instead of byte per byte), let's make use of it in PageIsVerifiedExtended(). --- src/backend/storage/page/bufpage.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index be6f1f62d2..aa264f61b9 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -89,10 +89,8 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) { PageHeader p = (PageHeader) page; size_t *pagebytes; - int i; bool checksum_failure = false; bool header_sane = false; - bool all_zeroes = false; uint16 checksum = 0; /* @@ -126,18 +124,9 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) } /* Check all-zeroes case */ - all_zeroes = true; pagebytes = (size_t *) page; - for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++) - { - if (pagebytes[i] != 0) - { - all_zeroes = false; - break; - } - } - if (all_zeroes) + if (pg_memory_is_all_zeros(pagebytes, BLCKSZ)) return true; /* -- 2.45.2
From 7bcdf33fded7b5ea3b5f365dca22088fec4a46e9 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 15 Nov 2024 08:36:38 +0900 Subject: [PATCH v13 3/3] Tweak some comments in 0001 --- src/include/utils/memutils.h | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 0808e91b77..f830093ca9 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -198,19 +198,20 @@ extern MemoryContext BumpContextCreate(MemoryContext parent, * The test is divided into multiple cases for safety reason and multiple phases * for efficiency. * - * Case 1: len < 8 bytes, then byte per byte comparison. - * Case 2: len in the 8-63 bytes range: + * Case 1: len < sizeof(size_t) bytes, then byte per byte comparison. + * Case 2: len < (sizeof(size_t) * 8 - 1) bytes: * - Phase 1: Byte by byte comparison, until the pointer is aligned. * - Phase 2: size_t comparisons, with aligned pointers, up to the last * location possible. * - Phase 3: Byte by byte comparison, until the end location. - * Case 3: len >= 64 bytes, same as case 2 except that an additional phase is - * is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t) - * comparisons using bitwise OR, to encourage compilers to use SIMD - * instructions if available, up to the last aligned location possible. + * Case 3: len >= (sizeof(size_t) * 8) bytes, same as case 2 except that an + * additional phase is placed between Phase 1 and Phase 2, with + * (8 * sizeof(size_t)) comparisons using bitwise OR, to encourage + * compilers to use SIMD instructions if available, up to the last + * aligned location possible. * * Case 1 and Case 2 are mandatory to ensure that we won't read beyond the - * memory area. + * memory area. This is portable for 32-bit and 64-bit architectures. * * Caller must ensure that "ptr" is not NULL. */ @@ -222,7 +223,6 @@ pg_memory_is_all_zeros(const void *ptr, size_t len) const unsigned char *aligned_end = (const unsigned char *) ((uintptr_t) end & (~(sizeof(size_t) - 1))); - /* len < 8 bytes case */ if (len < sizeof(size_t)) { while (p < end) @@ -233,7 +233,7 @@ pg_memory_is_all_zeros(const void *ptr, size_t len) return true; } - /* len in the 8-63 bytes range case */ + /* "len" in the [sizeof(size_t), sizeof(size_t) * 8 - 1] range */ if (len < sizeof(size_t) * 8) { /* Compare bytes until the pointer "p" is aligned */ @@ -248,9 +248,8 @@ pg_memory_is_all_zeros(const void *ptr, size_t len) /* * Compare remaining size_t-aligned chunks. * - * There is no risk to read beyond the memory area, as aligned_end - * can't be > end as we are in the len 8-63 bytes range case (so len - * >= 8). + * There is no risk to read beyond the memory area, as "aligned_end" + * cannot be higher than "end". */ for (; p < aligned_end; p += sizeof(size_t)) { @@ -267,9 +266,9 @@ pg_memory_is_all_zeros(const void *ptr, size_t len) return true; } - /* len >= 64 bytes case */ + /* "len" in the [sizeof(size_t) * 8, inf[ range */ - /* Compare bytes until the pointer "p" is aligned */ + /* Compare bytes until the pointer "p" is aligned. */ while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0) { if (p == end) @@ -288,9 +287,6 @@ pg_memory_is_all_zeros(const void *ptr, size_t len) * all 8 elements regardless of the result of the other comparisons. This * seems to be enough to coax a few compilers into using SIMD * instructions. - * - * There is no risk to read beyond the memory area as we are in the len >= - * 64 bytes case. */ for (; p < aligned_end - (sizeof(size_t) * 7); p += sizeof(size_t) * 8) { @@ -304,8 +300,8 @@ pg_memory_is_all_zeros(const void *ptr, size_t len) /* * Compare remaining size_t-aligned chunks. * - * There is no risk to read beyond the memory area, as aligned_end can't - * be > end as we are in the len >= 64 bytes case (so len >= 8). + * There is no risk to read beyond the memory area, as "aligned_end" + * cannot be higher than "end". */ for (; p < aligned_end; p += sizeof(size_t)) { @@ -313,7 +309,7 @@ pg_memory_is_all_zeros(const void *ptr, size_t len) return false; } - /* Compare remaining bytes until the end */ + /* Compare remaining bytes until the end. */ while (p < end) { if (*p++ != 0) -- 2.45.2
signature.asc
Description: PGP signature