Hi, On Thu, Nov 14, 2024 at 09:27:06AM +0900, Michael Paquier wrote: > Makes sense to me to just do that, with a first < 8 loop, and a second > for the 8~63 range.
Thanks for looking at it! > There is also a "cant'" in the last size_t check. Simple typo. Please find attached v12, with more comments and comments changes to explain the multiple cases (for safety) and phases (for efficiency). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From 833f4c223717ce71c5b59d27ce54383b4fc237ce Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 8 Nov 2024 14:19:59 +0900 Subject: [PATCH v12 1/2] Optimize pg_memory_is_all_zeros() pg_memory_is_all_zeros() is currently doing byte per byte comparison and so could lead to performance regression or penalties when multi bytes comparison could be done instead. Let's provide an optimized version that divides the checks into multiple cases for safety reason and multiple phases for efficiency: Case 1: len < 8 bytes, then byte per byte comparison. Case 2: len in the 8-63 bytes range: - Phase 1: Byte by byte comparison, until the pointer is aligned. - Phase 2: size_t comparisons, with aligned pointers, up to the last location possible. - Phase 3: Byte by byte comparison, until the end location. Case 3: len >= 64 bytes, same as case 2 except that an additional phase is is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers to use SIMD instructions if available, up to the last aligned location possible. Case 1 and Case 2 are mandatory to ensure that we won't read beyond the memory area. Code mainly suggested by David Rowley. --- src/include/utils/memutils.h | 123 ++++++++++++++++++++++++++++++++++- 1 file changed, 120 insertions(+), 3 deletions(-) 100.0% src/include/utils/ diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 3590c8bad9..0808e91b77 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -190,19 +190,136 @@ extern MemoryContext BumpContextCreate(MemoryContext parent, #define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024) /* + * pg_memory_is_all_zeros + * * Test if a memory region starting at "ptr" and of size "len" is full of * zeroes. + * + * The test is divided into multiple cases for safety reason and multiple phases + * for efficiency. + * + * Case 1: len < 8 bytes, then byte per byte comparison. + * Case 2: len in the 8-63 bytes range: + * - Phase 1: Byte by byte comparison, until the pointer is aligned. + * - Phase 2: size_t comparisons, with aligned pointers, up to the last + * location possible. + * - Phase 3: Byte by byte comparison, until the end location. + * Case 3: len >= 64 bytes, same as case 2 except that an additional phase is + * is placed between Phase 1 and Phase 2, with 8 * sizeof(size_t) + * comparisons using bitwise OR, to encourage compilers to use SIMD + * instructions if available, up to the last aligned location possible. + * + * Case 1 and Case 2 are mandatory to ensure that we won't read beyond the + * memory area. + * + * Caller must ensure that "ptr" is not NULL. */ static inline bool pg_memory_is_all_zeros(const void *ptr, size_t len) { - const char *p = (const char *) ptr; + const unsigned char *p = (const unsigned char *) ptr; + const unsigned char *end = &p[len]; + const unsigned char *aligned_end = (const unsigned char *) + ((uintptr_t) end & (~(sizeof(size_t) - 1))); + + /* len < 8 bytes case */ + if (len < sizeof(size_t)) + { + while (p < end) + { + if (*p++ != 0) + return false; + } + return true; + } + + /* len in the 8-63 bytes range case */ + if (len < sizeof(size_t) * 8) + { + /* Compare bytes until the pointer "p" is aligned */ + while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0) + { + if (p == end) + return true; + if (*p++ != 0) + return false; + } + + /* + * Compare remaining size_t-aligned chunks. + * + * There is no risk to read beyond the memory area, as aligned_end + * can't be > end as we are in the len 8-63 bytes range case (so len + * >= 8). + */ + for (; p < aligned_end; p += sizeof(size_t)) + { + if (*(size_t *) p != 0) + return false; + } + + /* Compare remaining bytes until the end */ + while (p < end) + { + if (*p++ != 0) + return false; + } + return true; + } + + /* len >= 64 bytes case */ - for (size_t i = 0; i < len; i++) + /* Compare bytes until the pointer "p" is aligned */ + while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0) { - if (p[i] != 0) + if (p == end) + return true; + + if (*p++ != 0) + return false; + } + + /* + * Compare 8 * sizeof(size_t) chunks at once. + * + * For performance reasons, we manually unroll this loop and purposefully + * use bitwise-ORs to combine each comparison. This prevents boolean + * short-circuiting and lets the compiler know that it's safe to access + * all 8 elements regardless of the result of the other comparisons. This + * seems to be enough to coax a few compilers into using SIMD + * instructions. + * + * There is no risk to read beyond the memory area as we are in the len >= + * 64 bytes case. + */ + for (; p < aligned_end - (sizeof(size_t) * 7); p += sizeof(size_t) * 8) + { + if ((((size_t *) p)[0] != 0) | (((size_t *) p)[1] != 0) | + (((size_t *) p)[2] != 0) | (((size_t *) p)[3] != 0) | + (((size_t *) p)[4] != 0) | (((size_t *) p)[5] != 0) | + (((size_t *) p)[6] != 0) | (((size_t *) p)[7] != 0)) + return false; + } + + /* + * Compare remaining size_t-aligned chunks. + * + * There is no risk to read beyond the memory area, as aligned_end can't + * be > end as we are in the len >= 64 bytes case (so len >= 8). + */ + for (; p < aligned_end; p += sizeof(size_t)) + { + if (*(size_t *) p != 0) return false; } + + /* Compare remaining bytes until the end */ + while (p < end) + { + if (*p++ != 0) + return false; + } + return true; } -- 2.34.1
>From 7a9c44b7b155312fce47027faf30ad3882fd06b5 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Fri, 8 Nov 2024 17:19:43 +0000 Subject: [PATCH v12 2/2] Make use of pg_memory_is_all_zeros() in PageIsVerifiedExtended() Now that pg_memory_is_all_zeros() is optimized to handle multi bytes comparison (instead of byte per byte), let's make use of it in PageIsVerifiedExtended(). --- src/backend/storage/page/bufpage.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) 100.0% src/backend/storage/page/ diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index be6f1f62d2..aa264f61b9 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -89,10 +89,8 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) { PageHeader p = (PageHeader) page; size_t *pagebytes; - int i; bool checksum_failure = false; bool header_sane = false; - bool all_zeroes = false; uint16 checksum = 0; /* @@ -126,18 +124,9 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) } /* Check all-zeroes case */ - all_zeroes = true; pagebytes = (size_t *) page; - for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++) - { - if (pagebytes[i] != 0) - { - all_zeroes = false; - break; - } - } - if (all_zeroes) + if (pg_memory_is_all_zeros(pagebytes, BLCKSZ)) return true; /* -- 2.34.1