On Tue, Jul 30, 2024 at 11:16 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > On 30/07/2024 00:50, Thomas Munro wrote: > > On Wed, Jul 3, 2024 at 8:09 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Thomas Munro <thomas.mu...@gmail.com> writes: > > OK, <stdatomic.h> part on ice for now. Here's an update of the rest, > > this time also removing the barrier fallbacks as discussed in the LTO > > thread[1]. > > Looks good to me.
Thanks. I'll wait just a bit longer to see if anyone else has comments. > > Perhaps I'm missing something but I suspect we might be failing to > > include arch-x86.h on that compiler when we should... maybe it needs > > to detect _M_AMD64 too? > > Aha, yes I think that's it. Apparently, __x86_64__ is not defined on > MSVC. To prove that, I added garbage to the "#ifdef __x86_64__" guarded > block in atomics.h. The compilation passes on MSVC, but not on other > platforms: https://cirrus-ci.com/build/6310061188841472. > > That means that we're not getting the x86-64 instructions in > src/port/pg_crc32c_sse42.c on MSVC either. > > I think we should do: > > #ifdef _M_AMD64 > #define __x86_64__ > #endif > > somewhere, perhaps in src/include/port/win32.h. Hmm. I had come up with the opposite solution, because we already tested for _M_AMD64 explicitly elsewhere, and also I was thinking we would back-patch, and I don't want to cause problems for external code that thinks that __x86_64__ implies it can bust out some GCC inline assembler or something. But I don't have a strong opinion, your idea is certainly simpler to implement and I also wouldn't mind much if we just fixed it in master only, for fear of subtle breakage... Same problem probably exists for i386. I don't think CI, build farm or the EDB packaging team do 32 bit Windows, so that makes it a little hard to know if your blind code changes have broken or fixed anything... on the other hand it's pretty simple... I wondered if the pre-Meson system might have somehow defined __x86_64__, but I'm not seeing it. Commit b64d92f1a56 explicitly mentions that it was tested on MSVC, so I guess maybe it was just always "working" but not quite taking the intended code paths? Funny though, that code that calls _mm_pause() on AMD64 or the __asm thing that only works on i386 doesn't look like blind code to me. Curious.
From 47a8445c946e3792247fcf818c6e60ae72693f5c Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Tue, 30 Jul 2024 11:01:26 +1200 Subject: [PATCH] Fix x86 architecture detection on MSVC. We were looking for __x86_64__, but MSVC calls it _M_AMD64. Therefore we were mapping pg_{read,write}_barrier() to expensive pg_memory_barrier() instead of pg_compiler_barrier(), and not using the intended spinlock delay primitive. A couple of other places missed it as well. The problem probably exists for _M_IX86 (32 bit) too; this is untested due to lack of 32 bit Windows CI, but that macro was already used in our tree so it seems safe to use it in new places. Back-patch to all supported releases. Reviewed-by: Heikki Linnakangas <hlinn...@iki.fi> Discussion: https://postgr.es/m/CA%2BhUKGKAf_i6w7hB_3pqZXQeqn%2BixvY%2BCMps_n%3DmJ5HAatMjMw%40mail.gmail.com --- contrib/pgcrypto/crypt-blowfish.c | 4 ++-- src/include/port/atomics.h | 3 ++- src/include/port/atomics/arch-x86.h | 2 +- src/port/pg_crc32c_sse42.c | 2 +- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/contrib/pgcrypto/crypt-blowfish.c b/contrib/pgcrypto/crypt-blowfish.c index 5a1b1e1009..c34e66b2f7 100644 --- a/contrib/pgcrypto/crypt-blowfish.c +++ b/contrib/pgcrypto/crypt-blowfish.c @@ -38,10 +38,10 @@ #include "px-crypt.h" #include "px.h" -#ifdef __i386__ +#if defined(__i386__) || defined(_M_IX86) #define BF_ASM 0 /* 1 */ #define BF_SCALE 1 -#elif defined(__x86_64__) +#elif defined(__x86_64__) || defined(_M_AMD64) #define BF_ASM 0 #define BF_SCALE 1 #else diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h index f6fa432d2d..ec59745168 100644 --- a/src/include/port/atomics.h +++ b/src/include/port/atomics.h @@ -65,7 +65,8 @@ */ #if defined(__arm__) || defined(__arm) || defined(__aarch64__) #include "port/atomics/arch-arm.h" -#elif defined(__i386__) || defined(__i386) || defined(__x86_64__) +#elif defined(__i386__) || defined(__i386) || defined(_M_IX86) || \ + defined(__x86_64__) || defined(_M_AMD64) #include "port/atomics/arch-x86.h" #elif defined(__ppc__) || defined(__powerpc__) || defined(__ppc64__) || defined(__powerpc64__) #include "port/atomics/arch-ppc.h" diff --git a/src/include/port/atomics/arch-x86.h b/src/include/port/atomics/arch-x86.h index 2a8eca30fc..4ecf540d12 100644 --- a/src/include/port/atomics/arch-x86.h +++ b/src/include/port/atomics/arch-x86.h @@ -113,7 +113,7 @@ pg_spin_delay_impl(void) { __asm__ __volatile__(" rep; nop \n"); } -#elif defined(_MSC_VER) && defined(__x86_64__) +#elif defined(_MSC_VER) && defined(_M_AMD64) #define PG_HAVE_SPIN_DELAY static __forceinline void pg_spin_delay_impl(void) diff --git a/src/port/pg_crc32c_sse42.c b/src/port/pg_crc32c_sse42.c index 7f88c11480..9a87070853 100644 --- a/src/port/pg_crc32c_sse42.c +++ b/src/port/pg_crc32c_sse42.c @@ -32,7 +32,7 @@ pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len) * and performance testing didn't show any performance gain from aligning * the begin address. */ -#ifdef __x86_64__ +#if defined(__x86_64__) || defined(_M_AMD64) while (p + 8 <= pend) { crc = (uint32) _mm_crc32_u64(crc, *((const uint64 *) p)); -- 2.39.2