Hi,

Lasse Collin wrote:
> > These two modules implement the stdc_store8_* functions from the ISO
> > C2y draft.
> 
> Implementing the aligned loads and stores with type punning results in
> strict aliasing violations and undefined behavior. I attached a demo
> program. When built with GCC 15.2.1 on x86-64, the output differs
> depending on the optimization level (-O0 vs. -O2). With -O2, the 16-bit
> store isn't seen by the 32-bit loads (v0 = 0x13121110 instead of
> 0xF3F21110). Adding -fno-strict-aliasing fixes it but that isn't
> standard C anymore.

Indeed. Thanks a lot for reporting this!

> It's great that C2y will standardize this functionality because
> currently they can be difficult to implement in a portable manner
> *while keeping them very fast*.

Oh yes. Now I understand why these seemingly trivial functions made it
into C2y :). <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3074.htm>

> I have written some notes on lines
> 202-323 and lines 621-644:
> 
> https://github.com/tukaani-project/xz/blob/bfc5f12a84a2a9df774ed16cd6eb58fd5ab24646/src/common/tuklib_integer.h#L202
> 
> The above needs to know if unaligned access is fast or not. For some
> archs it's simple, but for ARM64 and LoongArch I couldn't figure out
> anything better than building a test program and looking at objdump's
> output. The code and relevant notes are here:
> 
> https://github.com/tukaani-project/xz/blob/bfc5f12a84a2a9df774ed16cd6eb58fd5ab24646/m4/tuklib_integer.m4#L65

Thanks for these references. I'm applying the attached fix to Gnulib.

In particular, I appreciate your finding that the combination of memcpy
and __builtin_assume_aligned produces the best possible code (with
gcc >= 4.7 and clang).

What I'm doing differently than you did:
  - I don't distinguish "strict-align" and "non-strict-align" architectures,
    because in most "non-strict-align" architectures, unaligned accesses
    are slow. Compilers know this, and they prefer to emit a few instructions
    that each uses 1 cycle, than a single instruction which uses 10 or 20
    cycles.
  - So, the only distinction we need to make is regarding the compiler:
      - gcc >= 4.7, clang,
      - MSVC,
      - other compilers.
    In Gnulib, we don't care much about optimizing for 10 years old gcc
    versions. Making sure to get good code for gcc versions >= 10 (and
    clang) is what we care about.
    Find attached the test program, with which I evaluated which variant
    produces the best code.


2026-03-13  Bruno Haible  <[email protected]>

        stdc_load8_aligned, stdc_store8_aligned: Fix strict aliasing violations.
        Reported by Lasse Collin <[email protected]> in
        <https://lists.gnu.org/archive/html/bug-gnulib/2026-03/msg00094.html>.
        * lib/stdbit.in.h: Include <string.h>.
        (_GL_LOADSTORE8_VARIANT_A, _GL_LOADSTORE8_VARIANT_E,
        _GL_LOADSTORE8_VARIANT_F): New macros.
        (stdc_load8_aligned_beu16, stdc_load8_aligned_beu32,
        stdc_load8_aligned_beu64, stdc_load8_aligned_leu16,
        stdc_load8_aligned_leu32, stdc_load8_aligned_leu64,
        stdc_load8_aligned_bes8, stdc_load8_aligned_bes16,
        stdc_load8_aligned_bes32, stdc_load8_aligned_bes64,
        stdc_load8_aligned_les8, stdc_load8_aligned_les16,
        stdc_load8_aligned_les32, stdc_load8_aligned_les64,
        stdc_load8_bes8, stdc_load8_les8, stdc_store8_aligned_beu16,
        stdc_store8_aligned_beu32, stdc_store8_aligned_beu64,
        stdc_store8_aligned_leu16, stdc_store8_aligned_leu32,
        stdc_store8_aligned_leu64, stdc_store8_aligned_bes8,
        stdc_store8_aligned_bes16, stdc_store8_aligned_bes32,
        stdc_store8_aligned_bes64, stdc_store8_aligned_les8,
        stdc_store8_aligned_les16, stdc_store8_aligned_les32,
        stdc_store8_aligned_les64): Don't cast a pointer to a pointer to a
        different element type.

From a87fe458097deb3bdc62129bb2be982707733f01 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Fri, 13 Mar 2026 18:53:39 +0100
Subject: [PATCH] stdc_load8_aligned, stdc_store8_aligned: Fix strict aliasing
 violations.

Reported by Lasse Collin <[email protected]> in
<https://lists.gnu.org/archive/html/bug-gnulib/2026-03/msg00094.html>.

* lib/stdbit.in.h: Include <string.h>.
(_GL_LOADSTORE8_VARIANT_A, _GL_LOADSTORE8_VARIANT_E,
_GL_LOADSTORE8_VARIANT_F): New macros.
(stdc_load8_aligned_beu16, stdc_load8_aligned_beu32,
stdc_load8_aligned_beu64, stdc_load8_aligned_leu16,
stdc_load8_aligned_leu32, stdc_load8_aligned_leu64,
stdc_load8_aligned_bes8, stdc_load8_aligned_bes16,
stdc_load8_aligned_bes32, stdc_load8_aligned_bes64,
stdc_load8_aligned_les8, stdc_load8_aligned_les16,
stdc_load8_aligned_les32, stdc_load8_aligned_les64,
stdc_load8_bes8, stdc_load8_les8, stdc_store8_aligned_beu16,
stdc_store8_aligned_beu32, stdc_store8_aligned_beu64,
stdc_store8_aligned_leu16, stdc_store8_aligned_leu32,
stdc_store8_aligned_leu64, stdc_store8_aligned_bes8,
stdc_store8_aligned_bes16, stdc_store8_aligned_bes32,
stdc_store8_aligned_bes64, stdc_store8_aligned_les8,
stdc_store8_aligned_les16, stdc_store8_aligned_les32,
stdc_store8_aligned_les64): Don't cast a pointer to a pointer to a
different element type.
---
 ChangeLog       |  25 +++
 lib/stdbit.in.h | 420 +++++++++++++++++++++++++++++++++++-------------
 2 files changed, 337 insertions(+), 108 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index f9e257f459..fc1b251d27 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,28 @@
+2026-03-13  Bruno Haible  <[email protected]>
+
+	stdc_load8_aligned, stdc_store8_aligned: Fix strict aliasing violations.
+	Reported by Lasse Collin <[email protected]> in
+	<https://lists.gnu.org/archive/html/bug-gnulib/2026-03/msg00094.html>.
+	* lib/stdbit.in.h: Include <string.h>.
+	(_GL_LOADSTORE8_VARIANT_A, _GL_LOADSTORE8_VARIANT_E,
+	_GL_LOADSTORE8_VARIANT_F): New macros.
+	(stdc_load8_aligned_beu16, stdc_load8_aligned_beu32,
+	stdc_load8_aligned_beu64, stdc_load8_aligned_leu16,
+	stdc_load8_aligned_leu32, stdc_load8_aligned_leu64,
+	stdc_load8_aligned_bes8, stdc_load8_aligned_bes16,
+	stdc_load8_aligned_bes32, stdc_load8_aligned_bes64,
+	stdc_load8_aligned_les8, stdc_load8_aligned_les16,
+	stdc_load8_aligned_les32, stdc_load8_aligned_les64,
+	stdc_load8_bes8, stdc_load8_les8, stdc_store8_aligned_beu16,
+	stdc_store8_aligned_beu32, stdc_store8_aligned_beu64,
+	stdc_store8_aligned_leu16, stdc_store8_aligned_leu32,
+	stdc_store8_aligned_leu64, stdc_store8_aligned_bes8,
+	stdc_store8_aligned_bes16, stdc_store8_aligned_bes32,
+	stdc_store8_aligned_bes64, stdc_store8_aligned_les8,
+	stdc_store8_aligned_les16, stdc_store8_aligned_les32,
+	stdc_store8_aligned_les64): Don't cast a pointer to a pointer to a
+	different element type.
+
 2026-03-13  Bruno Haible  <[email protected]>
 
 	stdc_store8: Add tests.
diff --git a/lib/stdbit.in.h b/lib/stdbit.in.h
index db3f072b49..aba7712810 100644
--- a/lib/stdbit.in.h
+++ b/lib/stdbit.in.h
@@ -50,6 +50,9 @@
 /* Get bswap_16, bswap_32, bswap_64.  */
 # include <byteswap.h>
 
+/* Get memcpy.  */
+# include <string.h>
+
 #endif
 
 _GL_INLINE_HEADER_BEGIN
@@ -1115,6 +1118,117 @@ stdc_bit_ceil_ull (unsigned long long int n)
 
 /* ISO C2y ?? 7.18.21 Endian-Aware 8-Bit Load  */
 
+/* Here we need to avoid type-punning, because the compiler's aliasing
+   analysis would frequently produce incorrect code, and requiring the
+   option '-fno-strict-aliasing' is no viable solution.
+   So, this definition won't work:
+
+     uint16_t
+     load16 (const unsigned char ptr[2])
+     {
+       return *(const uint16_t *)ptr;
+     }
+
+   Instead, the following definitions are candidates:
+
+     // Trick from Lasse Collin: use memcpy and __builtin_assume_aligned.
+     uint16_t
+     load16_a (const unsigned char ptr[2])
+     {
+       uint16_t value;
+       memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
+       return value;
+     }
+
+     // Use __builtin_assume_aligned, without memcpy.
+     uint16_t
+     load16_b (const unsigned char ptr[2])
+     {
+       const unsigned char *aptr =
+         (const unsigned char *) __builtin_assume_aligned (ptr, 2);
+       #if WORDS_BIGENDIAN
+       return ((uint16_t) aptr [0] << 8) | (uint16_t) aptr [1];
+       #else
+       return (uint16_t) aptr [0] | ((uint16_t) aptr [1] << 8);
+       #endif
+     }
+
+     // Use memcpy and __assume.
+     uint16_t
+     load16_c (const unsigned char ptr[2])
+     {
+       __assume (((uintptr_t) ptr & (2 - 1)) == 0);
+       uint16_t value;
+       memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
+       return value;
+     }
+
+     // Use __assume, without memcpy.
+     uint16_t
+     load16_d (const unsigned char ptr[2])
+     {
+       __assume (((uintptr_t) ptr & (2 - 1)) == 0);
+       #if WORDS_BIGENDIAN
+       return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
+       #else
+       return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
+       #endif
+     }
+
+     // Use memcpy, without __builtin_assume_aligned or __assume.
+     uint16_t
+     load16_e (const unsigned char ptr[2])
+     {
+       uint16_t value;
+       memcpy (&value, ptr, 2);
+       return value;
+     }
+
+     // Use the code for the unaligned case.
+     uint16_t
+     load16_f (const unsigned char ptr[2])
+     {
+       #if WORDS_BIGENDIAN
+       return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
+       #else
+       return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
+       #endif
+     }
+
+   Portability constraints:
+     - __builtin_assume_aligned works only in GCC >= 4.7 and clang >= 4.
+     - __assume works only with MSVC (_MSC_VER >= 1200).
+
+   Which variant produces the best code?
+     - memcpy is inlined only in gcc >= 3.4, g++ >= 4.9, clang >= 4.
+     - MSVC's __assume has no effect.
+     - With gcc 13:
+       On armelhf, arm64, i686, powerpc, powerpc64, powerpc64le, s390x, x86_64:
+       All of a,b,e,f are equally good.
+       On alpha, arm, hppa, mips, mips64, riscv64, sh4, sparc64:
+       Only a,b are good; f medium; e worst.
+     - With older gcc versions on x86_64:
+       gcc >= 10: All of a,b,e,f are equally good.
+       gcc < 10: Only a,e are good; b,f medium.
+     - With MSVC 14: Only c,e are good; d,f medium.
+
+   So, we use the following heuristic for getting good code:
+     - gcc >= 4.7, g++ >= 4.9, clang >= 4: Use variant a.
+     - MSVC: Use variant e.
+     - Otherwise: Use variant f.
+ */
+#if (defined __clang__ ? __clang_major__ >= 4 : \
+     (defined __GNUC__ \
+      && (defined __cplusplus \
+          ? __GNUC__ + (__GNUC_MINOR__ >= 9) > 4 \
+          : __GNUC__ + (__GNUC_MINOR__ >= 7) > 4)))
+# define _GL_LOADSTORE8_VARIANT_A 1
+#elif defined _MSC_VER
+# define _GL_LOADSTORE8_VARIANT_E 1
+#else
+# define _GL_LOADSTORE8_VARIANT_F 1
+#endif
+
 #if @GNULIB_STDC_LOAD8_ALIGNED@
 
 _GL_STDC_LOAD8_ALIGNED_INLINE uint_least8_t
@@ -1126,33 +1240,64 @@ stdc_load8_aligned_beu8 (const unsigned char ptr[1])
 _GL_STDC_LOAD8_ALIGNED_INLINE uint_least16_t
 stdc_load8_aligned_beu16 (const unsigned char ptr[2])
 {
-  uint16_t value = *(const uint16_t *)ptr;
-# ifdef WORDS_BIGENDIAN
-  return value;
+# if _GL_LOADSTORE8_VARIANT_F
+  return ((uint_fast16_t) ptr[0] << 8) | (uint_fast16_t) ptr[1];
 # else
+  uint16_t value;
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (&value, ptr, 2);
+#  endif
+#  ifdef WORDS_BIGENDIAN
+  return value;
+#  else
   return bswap_16 (value);
+#  endif
 # endif
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE uint_least32_t
 stdc_load8_aligned_beu32 (const unsigned char ptr[4])
 {
-  uint32_t value = *(const uint32_t *)ptr;
-# ifdef WORDS_BIGENDIAN
-  return value;
+# if _GL_LOADSTORE8_VARIANT_F
+  return ((uint_fast32_t) ptr[0] << 24) | ((uint_fast32_t) ptr[1] << 16)
+         | ((uint_fast32_t) ptr[2] << 8) | (uint_fast32_t) ptr[3];
 # else
+  uint32_t value;
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (&value, __builtin_assume_aligned (ptr, 4), 4);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (&value, ptr, 4);
+#  endif
+#  ifdef WORDS_BIGENDIAN
+  return value;
+#  else
   return bswap_32 (value);
+#  endif
 # endif
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE uint_least64_t
 stdc_load8_aligned_beu64 (const unsigned char ptr[8])
 {
-  uint64_t value = *(const uint64_t *)ptr;
-# ifdef WORDS_BIGENDIAN
-  return value;
+# if _GL_LOADSTORE8_VARIANT_F
+  return ((uint_fast64_t) ptr[0] << 56) | ((uint_fast64_t) ptr[1] << 48)
+         | ((uint_fast64_t) ptr[2] << 40) | ((uint_fast64_t) ptr[3] << 32)
+         | ((uint_fast64_t) ptr[4] << 24) | ((uint_fast64_t) ptr[5] << 16)
+         | ((uint_fast64_t) ptr[6] << 8) | (uint_fast64_t) ptr[7];
 # else
+  uint64_t value;
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (&value, __builtin_assume_aligned (ptr, 8), 8);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (&value, ptr, 8);
+#  endif
+#  ifdef WORDS_BIGENDIAN
+  return value;
+#  else
   return bswap_64 (value);
+#  endif
 # endif
 }
 
@@ -1165,112 +1310,113 @@ stdc_load8_aligned_leu8 (const unsigned char ptr[1])
 _GL_STDC_LOAD8_ALIGNED_INLINE uint_least16_t
 stdc_load8_aligned_leu16 (const unsigned char ptr[2])
 {
-  uint16_t value = *(const uint16_t *)ptr;
-# ifdef WORDS_BIGENDIAN
-  return bswap_16 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+  return (uint_fast16_t) ptr[0] | ((uint_fast16_t) ptr[1] << 8);
 # else
+  uint16_t value;
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (&value, ptr, 2);
+#  endif
+#  ifdef WORDS_BIGENDIAN
+  return bswap_16 (value);
+#  else
   return value;
+#  endif
 # endif
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE uint_least32_t
 stdc_load8_aligned_leu32 (const unsigned char ptr[4])
 {
-  uint32_t value = *(const uint32_t *)ptr;
-# ifdef WORDS_BIGENDIAN
-  return bswap_32 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+  return (uint_fast32_t) ptr[0] | ((uint_fast32_t) ptr[1] << 8)
+         | ((uint_fast32_t) ptr[2] << 16) | ((uint_fast32_t) ptr[3] << 24);
 # else
+  uint32_t value;
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (&value, __builtin_assume_aligned (ptr, 4), 4);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (&value, ptr, 4);
+#  endif
+#  ifdef WORDS_BIGENDIAN
+  return bswap_32 (value);
+#  else
   return value;
+#  endif
 # endif
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE uint_least64_t
 stdc_load8_aligned_leu64 (const unsigned char ptr[8])
 {
-  uint64_t value = *(const uint64_t *)ptr;
-# ifdef WORDS_BIGENDIAN
-  return bswap_64 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+  return (uint_fast64_t) ptr[0] | ((uint_fast64_t) ptr[1] << 8)
+         | ((uint_fast64_t) ptr[2] << 16) | ((uint_fast64_t) ptr[3] << 24)
+         | ((uint_fast64_t) ptr[4] << 32) | ((uint_fast64_t) ptr[5] << 40)
+         | ((uint_fast64_t) ptr[6] << 48) | ((uint_fast64_t) ptr[7] << 56);
 # else
+  uint64_t value;
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (&value, __builtin_assume_aligned (ptr, 8), 8);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (&value, ptr, 8);
+#  endif
+#  ifdef WORDS_BIGENDIAN
+  return bswap_64 (value);
+#  else
   return value;
+#  endif
 # endif
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE int_least8_t
 stdc_load8_aligned_bes8 (const unsigned char ptr[1])
 {
-  return *(signed char *)ptr;
+  return (int8_t) ptr[0];
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE int_least16_t
 stdc_load8_aligned_bes16 (const unsigned char ptr[2])
 {
-# ifdef WORDS_BIGENDIAN
-  return *(const int16_t *)ptr;
-# else
-  uint16_t value = *(const uint16_t *)ptr;
-  return (int16_t) bswap_16 (value);
-# endif
+  return (int16_t) stdc_load8_aligned_beu16 (ptr);
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE int_least32_t
 stdc_load8_aligned_bes32 (const unsigned char ptr[4])
 {
-# ifdef WORDS_BIGENDIAN
-  return *(const int32_t *)ptr;
-# else
-  uint32_t value = *(const uint32_t *)ptr;
-  return (int32_t) bswap_32 (value);
-# endif
+  return (int32_t) stdc_load8_aligned_beu32 (ptr);
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE int_least64_t
 stdc_load8_aligned_bes64 (const unsigned char ptr[8])
 {
-# ifdef WORDS_BIGENDIAN
-  return *(const int64_t *)ptr;
-# else
-  uint64_t value = *(const uint64_t *)ptr;
-  return (int64_t) bswap_64 (value);
-# endif
+  return (int64_t) stdc_load8_aligned_beu64 (ptr);
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE int_least8_t
 stdc_load8_aligned_les8 (const unsigned char ptr[1])
 {
-  return *(signed char *)ptr;
+  return (int8_t) ptr[0];
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE int_least16_t
 stdc_load8_aligned_les16 (const unsigned char ptr[2])
 {
-# ifdef WORDS_BIGENDIAN
-  uint16_t value = *(const uint16_t *)ptr;
-  return (int16_t) bswap_16 (value);
-# else
-  return *(const int16_t *)ptr;
-# endif
+  return (int16_t) stdc_load8_aligned_leu16 (ptr);
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE int_least32_t
 stdc_load8_aligned_les32 (const unsigned char ptr[4])
 {
-# ifdef WORDS_BIGENDIAN
-  uint32_t value = *(const uint32_t *)ptr;
-  return (int32_t) bswap_32 (value);
-# else
-  return *(const int32_t *)ptr;
-# endif
+  return (int32_t) stdc_load8_aligned_leu32 (ptr);
 }
 
 _GL_STDC_LOAD8_ALIGNED_INLINE int_least64_t
 stdc_load8_aligned_les64 (const unsigned char ptr[8])
 {
-# ifdef WORDS_BIGENDIAN
-  uint64_t value = *(const uint64_t *)ptr;
-  return (int64_t) bswap_64 (value);
-# else
-  return *(const int64_t *)ptr;
-# endif
+  return (int64_t) stdc_load8_aligned_leu64 (ptr);
 }
 
 #endif
@@ -1336,7 +1482,7 @@ stdc_load8_leu64 (const unsigned char ptr[8])
 _GL_STDC_LOAD8_INLINE int_least8_t
 stdc_load8_bes8 (const unsigned char ptr[1])
 {
-  return *(signed char *)ptr;
+  return (int8_t) ptr[0];
 }
 
 _GL_STDC_LOAD8_INLINE int_least16_t
@@ -1366,7 +1512,7 @@ stdc_load8_bes64 (const unsigned char ptr[8])
 _GL_STDC_LOAD8_INLINE int_least8_t
 stdc_load8_les8 (const unsigned char ptr[1])
 {
-  return *(signed char *)ptr;
+  return (int8_t) ptr[0];
 }
 
 _GL_STDC_LOAD8_INLINE int_least16_t
@@ -1409,30 +1555,71 @@ stdc_store8_aligned_beu8 (uint_least8_t value, unsigned char ptr[1])
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_beu16 (uint_least16_t value, unsigned char ptr[2])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint16_t *)ptr = value;
+# if _GL_LOADSTORE8_VARIANT_F
+  ptr[0] = (unsigned char) (value >> 8) & 0xFFU;
+  ptr[1] = (unsigned char) value & 0xFFU;
 # else
-  *(uint16_t *)ptr = bswap_16 (value);
+  uint16_t uvalue;
+#  ifdef WORDS_BIGENDIAN
+  uvalue = value;
+#  else
+  uvalue = bswap_16 (value);
+#  endif
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (__builtin_assume_aligned (ptr, 2), &uvalue, 2);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (ptr, &uvalue, 2);
+#  endif
 # endif
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_beu32 (uint_least32_t value, unsigned char ptr[4])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint32_t *)ptr = value;
+# if _GL_LOADSTORE8_VARIANT_F
+  ptr[0] = (unsigned char) (value >> 24) & 0xFFU;
+  ptr[1] = (unsigned char) (value >> 16) & 0xFFU;
+  ptr[2] = (unsigned char) (value >> 8) & 0xFFU;
+  ptr[3] = (unsigned char) value & 0xFFU;
 # else
-  *(uint32_t *)ptr = bswap_32 (value);
+  uint32_t uvalue;
+#  ifdef WORDS_BIGENDIAN
+  uvalue = value;
+#  else
+  uvalue = bswap_32 (value);
+#  endif
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (__builtin_assume_aligned (ptr, 4), &uvalue, 4);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (ptr, &uvalue, 4);
+#  endif
 # endif
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_beu64 (uint_least64_t value, unsigned char ptr[8])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint64_t *)ptr = value;
+# if _GL_LOADSTORE8_VARIANT_F
+  ptr[0] = (unsigned char) (value >> 56) & 0xFFU;
+  ptr[1] = (unsigned char) (value >> 48) & 0xFFU;
+  ptr[2] = (unsigned char) (value >> 40) & 0xFFU;
+  ptr[3] = (unsigned char) (value >> 32) & 0xFFU;
+  ptr[4] = (unsigned char) (value >> 24) & 0xFFU;
+  ptr[5] = (unsigned char) (value >> 16) & 0xFFU;
+  ptr[6] = (unsigned char) (value >> 8) & 0xFFU;
+  ptr[7] = (unsigned char) value & 0xFFU;
 # else
-  *(uint64_t *)ptr = bswap_64 (value);
+  uint64_t uvalue;
+#  ifdef WORDS_BIGENDIAN
+  uvalue = value;
+#  else
+  uvalue = bswap_64 (value);
+#  endif
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (__builtin_assume_aligned (ptr, 8), &uvalue, 8);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (ptr, &uvalue, 8);
+#  endif
 # endif
 }
 
@@ -1445,103 +1632,120 @@ stdc_store8_aligned_leu8 (uint_least8_t value, unsigned char ptr[1])
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_leu16 (uint_least16_t value, unsigned char ptr[2])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint16_t *)ptr = bswap_16 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+  ptr[0] = (unsigned char) value & 0xFFU;
+  ptr[1] = (unsigned char) (value >> 8) & 0xFFU;
 # else
-  *(uint16_t *)ptr = value;
+  uint16_t uvalue;
+#  ifdef WORDS_BIGENDIAN
+  uvalue = bswap_16 (value);
+#  else
+  uvalue = value;
+#  endif
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (__builtin_assume_aligned (ptr, 2), &uvalue, 2);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (ptr, &uvalue, 2);
+#  endif
 # endif
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_leu32 (uint_least32_t value, unsigned char ptr[4])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint32_t *)ptr = bswap_32 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+  ptr[0] = (unsigned char) value & 0xFFU;
+  ptr[1] = (unsigned char) (value >> 8) & 0xFFU;
+  ptr[2] = (unsigned char) (value >> 16) & 0xFFU;
+  ptr[3] = (unsigned char) (value >> 24) & 0xFFU;
 # else
-  *(uint32_t *)ptr = value;
+  uint32_t uvalue;
+#  ifdef WORDS_BIGENDIAN
+  uvalue = bswap_32 (value);
+#  else
+  uvalue = value;
+#  endif
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (__builtin_assume_aligned (ptr, 4), &uvalue, 4);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (ptr, &uvalue, 4);
+#  endif
 # endif
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_leu64 (uint_least64_t value, unsigned char ptr[8])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint64_t *)ptr = bswap_64 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+  ptr[0] = (unsigned char) value & 0xFFU;
+  ptr[1] = (unsigned char) (value >> 8) & 0xFFU;
+  ptr[2] = (unsigned char) (value >> 16) & 0xFFU;
+  ptr[3] = (unsigned char) (value >> 24) & 0xFFU;
+  ptr[4] = (unsigned char) (value >> 32) & 0xFFU;
+  ptr[5] = (unsigned char) (value >> 40) & 0xFFU;
+  ptr[6] = (unsigned char) (value >> 48) & 0xFFU;
+  ptr[7] = (unsigned char) (value >> 56) & 0xFFU;
 # else
-  *(uint64_t *)ptr = value;
+  uint64_t uvalue;
+#  ifdef WORDS_BIGENDIAN
+  uvalue = bswap_64 (value);
+#  else
+  uvalue = value;
+#  endif
+#  if _GL_LOADSTORE8_VARIANT_A
+  memcpy (__builtin_assume_aligned (ptr, 8), &uvalue, 8);
+#  else /* _GL_LOADSTORE8_VARIANT_E */
+  memcpy (ptr, &uvalue, 8);
+#  endif
 # endif
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_bes8 (int_least8_t value, unsigned char ptr[1])
 {
-  *(signed char *)ptr = value;
+  ptr[0] = (uint8_t) value;
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_bes16 (int_least16_t value, unsigned char ptr[2])
 {
-# ifdef WORDS_BIGENDIAN
-  *(int16_t *)ptr = value;
-# else
-  *(uint16_t *)ptr = bswap_16 ((uint16_t) value);
-# endif
+  stdc_store8_aligned_beu16 ((uint16_t) value, ptr);
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_bes32 (int_least32_t value, unsigned char ptr[4])
 {
-# ifdef WORDS_BIGENDIAN
-  *(int32_t *)ptr = value;
-# else
-  *(uint32_t *)ptr = bswap_32 ((uint32_t) value);
-# endif
+  stdc_store8_aligned_beu32 ((uint32_t) value, ptr);
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_bes64 (int_least64_t value, unsigned char ptr[8])
 {
-# ifdef WORDS_BIGENDIAN
-  *(int64_t *)ptr = value;
-# else
-  *(uint64_t *)ptr = bswap_64 ((uint64_t) value);
-# endif
+  stdc_store8_aligned_beu64 ((uint64_t) value, ptr);
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_les8 (int_least8_t value, unsigned char ptr[1])
 {
-  *(signed char *)ptr = value;
+  ptr[0] = (uint8_t) value;
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_les16 (int_least16_t value, unsigned char ptr[2])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint16_t *)ptr = bswap_16 ((uint16_t) value);
-# else
-  *(int16_t *)ptr = value;
-# endif
+  stdc_store8_aligned_leu16 ((uint16_t) value, ptr);
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_les32 (int_least32_t value, unsigned char ptr[4])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint32_t *)ptr = bswap_32 ((uint32_t) value);
-# else
-  *(int32_t *)ptr = value;
-# endif
+  stdc_store8_aligned_leu32 ((uint32_t) value, ptr);
 }
 
 _GL_STDC_STORE8_ALIGNED_INLINE void
 stdc_store8_aligned_les64 (int_least64_t value, unsigned char ptr[8])
 {
-# ifdef WORDS_BIGENDIAN
-  *(uint64_t *)ptr = bswap_64 ((uint64_t) value);
-# else
-  *(int64_t *)ptr = value;
-# endif
+  stdc_store8_aligned_leu64 ((uint64_t) value, ptr);
 }
 
 #endif
-- 
2.52.0

#define uint16_t unsigned short
#define size_t unsigned long
#define uintptr_t unsigned long
#ifdef __cplusplus
extern "C" {
#endif
extern void *memcpy (void *, const void *, size_t);
#ifdef __cplusplus
}
#endif

#if !defined _MSC_VER
     uint16_t
     load16 (const unsigned char ptr[2])
     { return *(const uint16_t *)ptr; }

     // Trick from Lasse Collin: use memcpy and __builtin_assume_aligned.
     uint16_t
     load16_a (const unsigned char ptr[2])
     {
       uint16_t value;
       memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
       return value;
     }

     // Use __builtin_assume_aligned, without memcpy.
     uint16_t
     load16_b (const unsigned char ptr[2])
     {
       const unsigned char *aptr =
         (const unsigned char *) __builtin_assume_aligned (ptr, 2);
       #if WORDS_BIGENDIAN
       return ((uint16_t) aptr [0] << 8) | (uint16_t) aptr [1];
       #else
       return (uint16_t) aptr [0] | ((uint16_t) aptr [1] << 8);
       #endif
     }
#endif

#if defined _MSC_VER
     // Use memcpy and __assume.
     uint16_t
     load16_c (const unsigned char ptr[2])
     {
       __assume (((uintptr_t) ptr & (2 - 1)) == 0);
       uint16_t value;
       memcpy (&value, ptr, 2);
       return value;
     }

     // Use __assume, without memcpy.
     uint16_t
     load16_d (const unsigned char ptr[2])
     {
       __assume (((uintptr_t) ptr & (2 - 1)) == 0);
       #if WORDS_BIGENDIAN
       return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
       #else
       return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
       #endif
     }
#endif

     // Use memcpy, without __builtin_assume_aligned or __assume.
     uint16_t
     load16_e (const unsigned char ptr[2])
     {
       uint16_t value;
       memcpy (&value, ptr, 2);
       return value;
     }

     // Use the code for the unaligned case.
     uint16_t
     load16_f (const unsigned char ptr[2])
     {
       #if WORDS_BIGENDIAN
       return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
       #else
       return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
       #endif
     }

// Modern gcc:
// arm64        all good
// armelhf      all good
// i686         all good
// powerpc      all good
// powerpc64    all good
// powerpc64le  all good
// s390x        all good
// x86_64       all good
// alpha        a,b good; f medium; e worst
// arm          a,b good; f medium; e worst
// hppa         a,b good; f medium; e worst
// mips         a,b good; f medium; e worst
// mips64       a,b good; f medium; e worst
// riscv64      a,b good; f medium; e worst
// sh4          a,b good; f medium; e worst
// sparc64      a,b good; f medium; e worst

// Older gcc, x86_64:
// 10..15       all good
// 4.7..9       a,e good; b,f medium

// MSVC         c,e good; d,f medium

Reply via email to