Hi,
Lasse Collin wrote:
> > These two modules implement the stdc_store8_* functions from the ISO
> > C2y draft.
>
> Implementing the aligned loads and stores with type punning results in
> strict aliasing violations and undefined behavior. I attached a demo
> program. When built with GCC 15.2.1 on x86-64, the output differs
> depending on the optimization level (-O0 vs. -O2). With -O2, the 16-bit
> store isn't seen by the 32-bit loads (v0 = 0x13121110 instead of
> 0xF3F21110). Adding -fno-strict-aliasing fixes it but that isn't
> standard C anymore.
Indeed. Thanks a lot for reporting this!
> It's great that C2y will standardize this functionality because
> currently they can be difficult to implement in a portable manner
> *while keeping them very fast*.
Oh yes. Now I understand why these seemingly trivial functions made it
into C2y :). <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3074.htm>
> I have written some notes on lines
> 202-323 and lines 621-644:
>
> https://github.com/tukaani-project/xz/blob/bfc5f12a84a2a9df774ed16cd6eb58fd5ab24646/src/common/tuklib_integer.h#L202
>
> The above needs to know if unaligned access is fast or not. For some
> archs it's simple, but for ARM64 and LoongArch I couldn't figure out
> anything better than building a test program and looking at objdump's
> output. The code and relevant notes are here:
>
> https://github.com/tukaani-project/xz/blob/bfc5f12a84a2a9df774ed16cd6eb58fd5ab24646/m4/tuklib_integer.m4#L65
Thanks for these references. I'm applying the attached fix to Gnulib.
In particular, I appreciate your finding that the combination of memcpy
and __builtin_assume_aligned produces the best possible code (with
gcc >= 4.7 and clang).
What I'm doing differently than you did:
- I don't distinguish "strict-align" and "non-strict-align" architectures,
because in most "non-strict-align" architectures, unaligned accesses
are slow. Compilers know this, and they prefer to emit a few instructions
that each uses 1 cycle, than a single instruction which uses 10 or 20
cycles.
- So, the only distinction we need to make is regarding the compiler:
- gcc >= 4.7, clang,
- MSVC,
- other compilers.
In Gnulib, we don't care much about optimizing for 10 years old gcc
versions. Making sure to get good code for gcc versions >= 10 (and
clang) is what we care about.
Find attached the test program, with which I evaluated which variant
produces the best code.
2026-03-13 Bruno Haible <[email protected]>
stdc_load8_aligned, stdc_store8_aligned: Fix strict aliasing violations.
Reported by Lasse Collin <[email protected]> in
<https://lists.gnu.org/archive/html/bug-gnulib/2026-03/msg00094.html>.
* lib/stdbit.in.h: Include <string.h>.
(_GL_LOADSTORE8_VARIANT_A, _GL_LOADSTORE8_VARIANT_E,
_GL_LOADSTORE8_VARIANT_F): New macros.
(stdc_load8_aligned_beu16, stdc_load8_aligned_beu32,
stdc_load8_aligned_beu64, stdc_load8_aligned_leu16,
stdc_load8_aligned_leu32, stdc_load8_aligned_leu64,
stdc_load8_aligned_bes8, stdc_load8_aligned_bes16,
stdc_load8_aligned_bes32, stdc_load8_aligned_bes64,
stdc_load8_aligned_les8, stdc_load8_aligned_les16,
stdc_load8_aligned_les32, stdc_load8_aligned_les64,
stdc_load8_bes8, stdc_load8_les8, stdc_store8_aligned_beu16,
stdc_store8_aligned_beu32, stdc_store8_aligned_beu64,
stdc_store8_aligned_leu16, stdc_store8_aligned_leu32,
stdc_store8_aligned_leu64, stdc_store8_aligned_bes8,
stdc_store8_aligned_bes16, stdc_store8_aligned_bes32,
stdc_store8_aligned_bes64, stdc_store8_aligned_les8,
stdc_store8_aligned_les16, stdc_store8_aligned_les32,
stdc_store8_aligned_les64): Don't cast a pointer to a pointer to a
different element type.
From a87fe458097deb3bdc62129bb2be982707733f01 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Fri, 13 Mar 2026 18:53:39 +0100
Subject: [PATCH] stdc_load8_aligned, stdc_store8_aligned: Fix strict aliasing
violations.
Reported by Lasse Collin <[email protected]> in
<https://lists.gnu.org/archive/html/bug-gnulib/2026-03/msg00094.html>.
* lib/stdbit.in.h: Include <string.h>.
(_GL_LOADSTORE8_VARIANT_A, _GL_LOADSTORE8_VARIANT_E,
_GL_LOADSTORE8_VARIANT_F): New macros.
(stdc_load8_aligned_beu16, stdc_load8_aligned_beu32,
stdc_load8_aligned_beu64, stdc_load8_aligned_leu16,
stdc_load8_aligned_leu32, stdc_load8_aligned_leu64,
stdc_load8_aligned_bes8, stdc_load8_aligned_bes16,
stdc_load8_aligned_bes32, stdc_load8_aligned_bes64,
stdc_load8_aligned_les8, stdc_load8_aligned_les16,
stdc_load8_aligned_les32, stdc_load8_aligned_les64,
stdc_load8_bes8, stdc_load8_les8, stdc_store8_aligned_beu16,
stdc_store8_aligned_beu32, stdc_store8_aligned_beu64,
stdc_store8_aligned_leu16, stdc_store8_aligned_leu32,
stdc_store8_aligned_leu64, stdc_store8_aligned_bes8,
stdc_store8_aligned_bes16, stdc_store8_aligned_bes32,
stdc_store8_aligned_bes64, stdc_store8_aligned_les8,
stdc_store8_aligned_les16, stdc_store8_aligned_les32,
stdc_store8_aligned_les64): Don't cast a pointer to a pointer to a
different element type.
---
ChangeLog | 25 +++
lib/stdbit.in.h | 420 +++++++++++++++++++++++++++++++++++-------------
2 files changed, 337 insertions(+), 108 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index f9e257f459..fc1b251d27 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,28 @@
+2026-03-13 Bruno Haible <[email protected]>
+
+ stdc_load8_aligned, stdc_store8_aligned: Fix strict aliasing violations.
+ Reported by Lasse Collin <[email protected]> in
+ <https://lists.gnu.org/archive/html/bug-gnulib/2026-03/msg00094.html>.
+ * lib/stdbit.in.h: Include <string.h>.
+ (_GL_LOADSTORE8_VARIANT_A, _GL_LOADSTORE8_VARIANT_E,
+ _GL_LOADSTORE8_VARIANT_F): New macros.
+ (stdc_load8_aligned_beu16, stdc_load8_aligned_beu32,
+ stdc_load8_aligned_beu64, stdc_load8_aligned_leu16,
+ stdc_load8_aligned_leu32, stdc_load8_aligned_leu64,
+ stdc_load8_aligned_bes8, stdc_load8_aligned_bes16,
+ stdc_load8_aligned_bes32, stdc_load8_aligned_bes64,
+ stdc_load8_aligned_les8, stdc_load8_aligned_les16,
+ stdc_load8_aligned_les32, stdc_load8_aligned_les64,
+ stdc_load8_bes8, stdc_load8_les8, stdc_store8_aligned_beu16,
+ stdc_store8_aligned_beu32, stdc_store8_aligned_beu64,
+ stdc_store8_aligned_leu16, stdc_store8_aligned_leu32,
+ stdc_store8_aligned_leu64, stdc_store8_aligned_bes8,
+ stdc_store8_aligned_bes16, stdc_store8_aligned_bes32,
+ stdc_store8_aligned_bes64, stdc_store8_aligned_les8,
+ stdc_store8_aligned_les16, stdc_store8_aligned_les32,
+ stdc_store8_aligned_les64): Don't cast a pointer to a pointer to a
+ different element type.
+
2026-03-13 Bruno Haible <[email protected]>
stdc_store8: Add tests.
diff --git a/lib/stdbit.in.h b/lib/stdbit.in.h
index db3f072b49..aba7712810 100644
--- a/lib/stdbit.in.h
+++ b/lib/stdbit.in.h
@@ -50,6 +50,9 @@
/* Get bswap_16, bswap_32, bswap_64. */
# include <byteswap.h>
+/* Get memcpy. */
+# include <string.h>
+
#endif
_GL_INLINE_HEADER_BEGIN
@@ -1115,6 +1118,117 @@ stdc_bit_ceil_ull (unsigned long long int n)
/* ISO C2y ?? 7.18.21 Endian-Aware 8-Bit Load */
+/* Here we need to avoid type-punning, because the compiler's aliasing
+ analysis would frequently produce incorrect code, and requiring the
+ option '-fno-strict-aliasing' is no viable solution.
+ So, this definition won't work:
+
+ uint16_t
+ load16 (const unsigned char ptr[2])
+ {
+ return *(const uint16_t *)ptr;
+ }
+
+ Instead, the following definitions are candidates:
+
+ // Trick from Lasse Collin: use memcpy and __builtin_assume_aligned.
+ uint16_t
+ load16_a (const unsigned char ptr[2])
+ {
+ uint16_t value;
+ memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
+ return value;
+ }
+
+ // Use __builtin_assume_aligned, without memcpy.
+ uint16_t
+ load16_b (const unsigned char ptr[2])
+ {
+ const unsigned char *aptr =
+ (const unsigned char *) __builtin_assume_aligned (ptr, 2);
+ #if WORDS_BIGENDIAN
+ return ((uint16_t) aptr [0] << 8) | (uint16_t) aptr [1];
+ #else
+ return (uint16_t) aptr [0] | ((uint16_t) aptr [1] << 8);
+ #endif
+ }
+
+ // Use memcpy and __assume.
+ uint16_t
+ load16_c (const unsigned char ptr[2])
+ {
+ __assume (((uintptr_t) ptr & (2 - 1)) == 0);
+ uint16_t value;
+ memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
+ return value;
+ }
+
+ // Use __assume, without memcpy.
+ uint16_t
+ load16_d (const unsigned char ptr[2])
+ {
+ __assume (((uintptr_t) ptr & (2 - 1)) == 0);
+ #if WORDS_BIGENDIAN
+ return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
+ #else
+ return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
+ #endif
+ }
+
+ // Use memcpy, without __builtin_assume_aligned or __assume.
+ uint16_t
+ load16_e (const unsigned char ptr[2])
+ {
+ uint16_t value;
+ memcpy (&value, ptr, 2);
+ return value;
+ }
+
+ // Use the code for the unaligned case.
+ uint16_t
+ load16_f (const unsigned char ptr[2])
+ {
+ #if WORDS_BIGENDIAN
+ return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
+ #else
+ return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
+ #endif
+ }
+
+ Portability constraints:
+ - __builtin_assume_aligned works only in GCC >= 4.7 and clang >= 4.
+ - __assume works only with MSVC (_MSC_VER >= 1200).
+
+ Which variant produces the best code?
+ - memcpy is inlined only in gcc >= 3.4, g++ >= 4.9, clang >= 4.
+ - MSVC's __assume has no effect.
+ - With gcc 13:
+ On armelhf, arm64, i686, powerpc, powerpc64, powerpc64le, s390x, x86_64:
+ All of a,b,e,f are equally good.
+ On alpha, arm, hppa, mips, mips64, riscv64, sh4, sparc64:
+ Only a,b are good; f medium; e worst.
+ - With older gcc versions on x86_64:
+ gcc >= 10: All of a,b,e,f are equally good.
+ gcc < 10: Only a,e are good; b,f medium.
+ - With MSVC 14: Only c,e are good; d,f medium.
+
+ So, we use the following heuristic for getting good code:
+ - gcc >= 4.7, g++ >= 4.9, clang >= 4: Use variant a.
+ - MSVC: Use variant e.
+ - Otherwise: Use variant f.
+ */
+#if (defined __clang__ ? __clang_major__ >= 4 : \
+ (defined __GNUC__ \
+ && (defined __cplusplus \
+ ? __GNUC__ + (__GNUC_MINOR__ >= 9) > 4 \
+ : __GNUC__ + (__GNUC_MINOR__ >= 7) > 4)))
+# define _GL_LOADSTORE8_VARIANT_A 1
+#elif defined _MSC_VER
+# define _GL_LOADSTORE8_VARIANT_E 1
+#else
+# define _GL_LOADSTORE8_VARIANT_F 1
+#endif
+
#if @GNULIB_STDC_LOAD8_ALIGNED@
_GL_STDC_LOAD8_ALIGNED_INLINE uint_least8_t
@@ -1126,33 +1240,64 @@ stdc_load8_aligned_beu8 (const unsigned char ptr[1])
_GL_STDC_LOAD8_ALIGNED_INLINE uint_least16_t
stdc_load8_aligned_beu16 (const unsigned char ptr[2])
{
- uint16_t value = *(const uint16_t *)ptr;
-# ifdef WORDS_BIGENDIAN
- return value;
+# if _GL_LOADSTORE8_VARIANT_F
+ return ((uint_fast16_t) ptr[0] << 8) | (uint_fast16_t) ptr[1];
# else
+ uint16_t value;
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (&value, ptr, 2);
+# endif
+# ifdef WORDS_BIGENDIAN
+ return value;
+# else
return bswap_16 (value);
+# endif
# endif
}
_GL_STDC_LOAD8_ALIGNED_INLINE uint_least32_t
stdc_load8_aligned_beu32 (const unsigned char ptr[4])
{
- uint32_t value = *(const uint32_t *)ptr;
-# ifdef WORDS_BIGENDIAN
- return value;
+# if _GL_LOADSTORE8_VARIANT_F
+ return ((uint_fast32_t) ptr[0] << 24) | ((uint_fast32_t) ptr[1] << 16)
+ | ((uint_fast32_t) ptr[2] << 8) | (uint_fast32_t) ptr[3];
# else
+ uint32_t value;
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (&value, __builtin_assume_aligned (ptr, 4), 4);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (&value, ptr, 4);
+# endif
+# ifdef WORDS_BIGENDIAN
+ return value;
+# else
return bswap_32 (value);
+# endif
# endif
}
_GL_STDC_LOAD8_ALIGNED_INLINE uint_least64_t
stdc_load8_aligned_beu64 (const unsigned char ptr[8])
{
- uint64_t value = *(const uint64_t *)ptr;
-# ifdef WORDS_BIGENDIAN
- return value;
+# if _GL_LOADSTORE8_VARIANT_F
+ return ((uint_fast64_t) ptr[0] << 56) | ((uint_fast64_t) ptr[1] << 48)
+ | ((uint_fast64_t) ptr[2] << 40) | ((uint_fast64_t) ptr[3] << 32)
+ | ((uint_fast64_t) ptr[4] << 24) | ((uint_fast64_t) ptr[5] << 16)
+ | ((uint_fast64_t) ptr[6] << 8) | (uint_fast64_t) ptr[7];
# else
+ uint64_t value;
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (&value, __builtin_assume_aligned (ptr, 8), 8);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (&value, ptr, 8);
+# endif
+# ifdef WORDS_BIGENDIAN
+ return value;
+# else
return bswap_64 (value);
+# endif
# endif
}
@@ -1165,112 +1310,113 @@ stdc_load8_aligned_leu8 (const unsigned char ptr[1])
_GL_STDC_LOAD8_ALIGNED_INLINE uint_least16_t
stdc_load8_aligned_leu16 (const unsigned char ptr[2])
{
- uint16_t value = *(const uint16_t *)ptr;
-# ifdef WORDS_BIGENDIAN
- return bswap_16 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+ return (uint_fast16_t) ptr[0] | ((uint_fast16_t) ptr[1] << 8);
# else
+ uint16_t value;
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (&value, ptr, 2);
+# endif
+# ifdef WORDS_BIGENDIAN
+ return bswap_16 (value);
+# else
return value;
+# endif
# endif
}
_GL_STDC_LOAD8_ALIGNED_INLINE uint_least32_t
stdc_load8_aligned_leu32 (const unsigned char ptr[4])
{
- uint32_t value = *(const uint32_t *)ptr;
-# ifdef WORDS_BIGENDIAN
- return bswap_32 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+ return (uint_fast32_t) ptr[0] | ((uint_fast32_t) ptr[1] << 8)
+ | ((uint_fast32_t) ptr[2] << 16) | ((uint_fast32_t) ptr[3] << 24);
# else
+ uint32_t value;
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (&value, __builtin_assume_aligned (ptr, 4), 4);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (&value, ptr, 4);
+# endif
+# ifdef WORDS_BIGENDIAN
+ return bswap_32 (value);
+# else
return value;
+# endif
# endif
}
_GL_STDC_LOAD8_ALIGNED_INLINE uint_least64_t
stdc_load8_aligned_leu64 (const unsigned char ptr[8])
{
- uint64_t value = *(const uint64_t *)ptr;
-# ifdef WORDS_BIGENDIAN
- return bswap_64 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+ return (uint_fast64_t) ptr[0] | ((uint_fast64_t) ptr[1] << 8)
+ | ((uint_fast64_t) ptr[2] << 16) | ((uint_fast64_t) ptr[3] << 24)
+ | ((uint_fast64_t) ptr[4] << 32) | ((uint_fast64_t) ptr[5] << 40)
+ | ((uint_fast64_t) ptr[6] << 48) | ((uint_fast64_t) ptr[7] << 56);
# else
+ uint64_t value;
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (&value, __builtin_assume_aligned (ptr, 8), 8);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (&value, ptr, 8);
+# endif
+# ifdef WORDS_BIGENDIAN
+ return bswap_64 (value);
+# else
return value;
+# endif
# endif
}
_GL_STDC_LOAD8_ALIGNED_INLINE int_least8_t
stdc_load8_aligned_bes8 (const unsigned char ptr[1])
{
- return *(signed char *)ptr;
+ return (int8_t) ptr[0];
}
_GL_STDC_LOAD8_ALIGNED_INLINE int_least16_t
stdc_load8_aligned_bes16 (const unsigned char ptr[2])
{
-# ifdef WORDS_BIGENDIAN
- return *(const int16_t *)ptr;
-# else
- uint16_t value = *(const uint16_t *)ptr;
- return (int16_t) bswap_16 (value);
-# endif
+ return (int16_t) stdc_load8_aligned_beu16 (ptr);
}
_GL_STDC_LOAD8_ALIGNED_INLINE int_least32_t
stdc_load8_aligned_bes32 (const unsigned char ptr[4])
{
-# ifdef WORDS_BIGENDIAN
- return *(const int32_t *)ptr;
-# else
- uint32_t value = *(const uint32_t *)ptr;
- return (int32_t) bswap_32 (value);
-# endif
+ return (int32_t) stdc_load8_aligned_beu32 (ptr);
}
_GL_STDC_LOAD8_ALIGNED_INLINE int_least64_t
stdc_load8_aligned_bes64 (const unsigned char ptr[8])
{
-# ifdef WORDS_BIGENDIAN
- return *(const int64_t *)ptr;
-# else
- uint64_t value = *(const uint64_t *)ptr;
- return (int64_t) bswap_64 (value);
-# endif
+ return (int64_t) stdc_load8_aligned_beu64 (ptr);
}
_GL_STDC_LOAD8_ALIGNED_INLINE int_least8_t
stdc_load8_aligned_les8 (const unsigned char ptr[1])
{
- return *(signed char *)ptr;
+ return (int8_t) ptr[0];
}
_GL_STDC_LOAD8_ALIGNED_INLINE int_least16_t
stdc_load8_aligned_les16 (const unsigned char ptr[2])
{
-# ifdef WORDS_BIGENDIAN
- uint16_t value = *(const uint16_t *)ptr;
- return (int16_t) bswap_16 (value);
-# else
- return *(const int16_t *)ptr;
-# endif
+ return (int16_t) stdc_load8_aligned_leu16 (ptr);
}
_GL_STDC_LOAD8_ALIGNED_INLINE int_least32_t
stdc_load8_aligned_les32 (const unsigned char ptr[4])
{
-# ifdef WORDS_BIGENDIAN
- uint32_t value = *(const uint32_t *)ptr;
- return (int32_t) bswap_32 (value);
-# else
- return *(const int32_t *)ptr;
-# endif
+ return (int32_t) stdc_load8_aligned_leu32 (ptr);
}
_GL_STDC_LOAD8_ALIGNED_INLINE int_least64_t
stdc_load8_aligned_les64 (const unsigned char ptr[8])
{
-# ifdef WORDS_BIGENDIAN
- uint64_t value = *(const uint64_t *)ptr;
- return (int64_t) bswap_64 (value);
-# else
- return *(const int64_t *)ptr;
-# endif
+ return (int64_t) stdc_load8_aligned_leu64 (ptr);
}
#endif
@@ -1336,7 +1482,7 @@ stdc_load8_leu64 (const unsigned char ptr[8])
_GL_STDC_LOAD8_INLINE int_least8_t
stdc_load8_bes8 (const unsigned char ptr[1])
{
- return *(signed char *)ptr;
+ return (int8_t) ptr[0];
}
_GL_STDC_LOAD8_INLINE int_least16_t
@@ -1366,7 +1512,7 @@ stdc_load8_bes64 (const unsigned char ptr[8])
_GL_STDC_LOAD8_INLINE int_least8_t
stdc_load8_les8 (const unsigned char ptr[1])
{
- return *(signed char *)ptr;
+ return (int8_t) ptr[0];
}
_GL_STDC_LOAD8_INLINE int_least16_t
@@ -1409,30 +1555,71 @@ stdc_store8_aligned_beu8 (uint_least8_t value, unsigned char ptr[1])
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_beu16 (uint_least16_t value, unsigned char ptr[2])
{
-# ifdef WORDS_BIGENDIAN
- *(uint16_t *)ptr = value;
+# if _GL_LOADSTORE8_VARIANT_F
+ ptr[0] = (unsigned char) (value >> 8) & 0xFFU;
+ ptr[1] = (unsigned char) value & 0xFFU;
# else
- *(uint16_t *)ptr = bswap_16 (value);
+ uint16_t uvalue;
+# ifdef WORDS_BIGENDIAN
+ uvalue = value;
+# else
+ uvalue = bswap_16 (value);
+# endif
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (__builtin_assume_aligned (ptr, 2), &uvalue, 2);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (ptr, &uvalue, 2);
+# endif
# endif
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_beu32 (uint_least32_t value, unsigned char ptr[4])
{
-# ifdef WORDS_BIGENDIAN
- *(uint32_t *)ptr = value;
+# if _GL_LOADSTORE8_VARIANT_F
+ ptr[0] = (unsigned char) (value >> 24) & 0xFFU;
+ ptr[1] = (unsigned char) (value >> 16) & 0xFFU;
+ ptr[2] = (unsigned char) (value >> 8) & 0xFFU;
+ ptr[3] = (unsigned char) value & 0xFFU;
# else
- *(uint32_t *)ptr = bswap_32 (value);
+ uint32_t uvalue;
+# ifdef WORDS_BIGENDIAN
+ uvalue = value;
+# else
+ uvalue = bswap_32 (value);
+# endif
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (__builtin_assume_aligned (ptr, 4), &uvalue, 4);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (ptr, &uvalue, 4);
+# endif
# endif
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_beu64 (uint_least64_t value, unsigned char ptr[8])
{
-# ifdef WORDS_BIGENDIAN
- *(uint64_t *)ptr = value;
+# if _GL_LOADSTORE8_VARIANT_F
+ ptr[0] = (unsigned char) (value >> 56) & 0xFFU;
+ ptr[1] = (unsigned char) (value >> 48) & 0xFFU;
+ ptr[2] = (unsigned char) (value >> 40) & 0xFFU;
+ ptr[3] = (unsigned char) (value >> 32) & 0xFFU;
+ ptr[4] = (unsigned char) (value >> 24) & 0xFFU;
+ ptr[5] = (unsigned char) (value >> 16) & 0xFFU;
+ ptr[6] = (unsigned char) (value >> 8) & 0xFFU;
+ ptr[7] = (unsigned char) value & 0xFFU;
# else
- *(uint64_t *)ptr = bswap_64 (value);
+ uint64_t uvalue;
+# ifdef WORDS_BIGENDIAN
+ uvalue = value;
+# else
+ uvalue = bswap_64 (value);
+# endif
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (__builtin_assume_aligned (ptr, 8), &uvalue, 8);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (ptr, &uvalue, 8);
+# endif
# endif
}
@@ -1445,103 +1632,120 @@ stdc_store8_aligned_leu8 (uint_least8_t value, unsigned char ptr[1])
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_leu16 (uint_least16_t value, unsigned char ptr[2])
{
-# ifdef WORDS_BIGENDIAN
- *(uint16_t *)ptr = bswap_16 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+ ptr[0] = (unsigned char) value & 0xFFU;
+ ptr[1] = (unsigned char) (value >> 8) & 0xFFU;
# else
- *(uint16_t *)ptr = value;
+ uint16_t uvalue;
+# ifdef WORDS_BIGENDIAN
+ uvalue = bswap_16 (value);
+# else
+ uvalue = value;
+# endif
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (__builtin_assume_aligned (ptr, 2), &uvalue, 2);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (ptr, &uvalue, 2);
+# endif
# endif
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_leu32 (uint_least32_t value, unsigned char ptr[4])
{
-# ifdef WORDS_BIGENDIAN
- *(uint32_t *)ptr = bswap_32 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+ ptr[0] = (unsigned char) value & 0xFFU;
+ ptr[1] = (unsigned char) (value >> 8) & 0xFFU;
+ ptr[2] = (unsigned char) (value >> 16) & 0xFFU;
+ ptr[3] = (unsigned char) (value >> 24) & 0xFFU;
# else
- *(uint32_t *)ptr = value;
+ uint32_t uvalue;
+# ifdef WORDS_BIGENDIAN
+ uvalue = bswap_32 (value);
+# else
+ uvalue = value;
+# endif
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (__builtin_assume_aligned (ptr, 4), &uvalue, 4);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (ptr, &uvalue, 4);
+# endif
# endif
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_leu64 (uint_least64_t value, unsigned char ptr[8])
{
-# ifdef WORDS_BIGENDIAN
- *(uint64_t *)ptr = bswap_64 (value);
+# if _GL_LOADSTORE8_VARIANT_F
+ ptr[0] = (unsigned char) value & 0xFFU;
+ ptr[1] = (unsigned char) (value >> 8) & 0xFFU;
+ ptr[2] = (unsigned char) (value >> 16) & 0xFFU;
+ ptr[3] = (unsigned char) (value >> 24) & 0xFFU;
+ ptr[4] = (unsigned char) (value >> 32) & 0xFFU;
+ ptr[5] = (unsigned char) (value >> 40) & 0xFFU;
+ ptr[6] = (unsigned char) (value >> 48) & 0xFFU;
+ ptr[7] = (unsigned char) (value >> 56) & 0xFFU;
# else
- *(uint64_t *)ptr = value;
+ uint64_t uvalue;
+# ifdef WORDS_BIGENDIAN
+ uvalue = bswap_64 (value);
+# else
+ uvalue = value;
+# endif
+# if _GL_LOADSTORE8_VARIANT_A
+ memcpy (__builtin_assume_aligned (ptr, 8), &uvalue, 8);
+# else /* _GL_LOADSTORE8_VARIANT_E */
+ memcpy (ptr, &uvalue, 8);
+# endif
# endif
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_bes8 (int_least8_t value, unsigned char ptr[1])
{
- *(signed char *)ptr = value;
+ ptr[0] = (uint8_t) value;
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_bes16 (int_least16_t value, unsigned char ptr[2])
{
-# ifdef WORDS_BIGENDIAN
- *(int16_t *)ptr = value;
-# else
- *(uint16_t *)ptr = bswap_16 ((uint16_t) value);
-# endif
+ stdc_store8_aligned_beu16 ((uint16_t) value, ptr);
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_bes32 (int_least32_t value, unsigned char ptr[4])
{
-# ifdef WORDS_BIGENDIAN
- *(int32_t *)ptr = value;
-# else
- *(uint32_t *)ptr = bswap_32 ((uint32_t) value);
-# endif
+ stdc_store8_aligned_beu32 ((uint32_t) value, ptr);
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_bes64 (int_least64_t value, unsigned char ptr[8])
{
-# ifdef WORDS_BIGENDIAN
- *(int64_t *)ptr = value;
-# else
- *(uint64_t *)ptr = bswap_64 ((uint64_t) value);
-# endif
+ stdc_store8_aligned_beu64 ((uint64_t) value, ptr);
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_les8 (int_least8_t value, unsigned char ptr[1])
{
- *(signed char *)ptr = value;
+ ptr[0] = (uint8_t) value;
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_les16 (int_least16_t value, unsigned char ptr[2])
{
-# ifdef WORDS_BIGENDIAN
- *(uint16_t *)ptr = bswap_16 ((uint16_t) value);
-# else
- *(int16_t *)ptr = value;
-# endif
+ stdc_store8_aligned_leu16 ((uint16_t) value, ptr);
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_les32 (int_least32_t value, unsigned char ptr[4])
{
-# ifdef WORDS_BIGENDIAN
- *(uint32_t *)ptr = bswap_32 ((uint32_t) value);
-# else
- *(int32_t *)ptr = value;
-# endif
+ stdc_store8_aligned_leu32 ((uint32_t) value, ptr);
}
_GL_STDC_STORE8_ALIGNED_INLINE void
stdc_store8_aligned_les64 (int_least64_t value, unsigned char ptr[8])
{
-# ifdef WORDS_BIGENDIAN
- *(uint64_t *)ptr = bswap_64 ((uint64_t) value);
-# else
- *(int64_t *)ptr = value;
-# endif
+ stdc_store8_aligned_leu64 ((uint64_t) value, ptr);
}
#endif
--
2.52.0
#define uint16_t unsigned short
#define size_t unsigned long
#define uintptr_t unsigned long
#ifdef __cplusplus
extern "C" {
#endif
extern void *memcpy (void *, const void *, size_t);
#ifdef __cplusplus
}
#endif
#if !defined _MSC_VER
uint16_t
load16 (const unsigned char ptr[2])
{ return *(const uint16_t *)ptr; }
// Trick from Lasse Collin: use memcpy and __builtin_assume_aligned.
uint16_t
load16_a (const unsigned char ptr[2])
{
uint16_t value;
memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
return value;
}
// Use __builtin_assume_aligned, without memcpy.
uint16_t
load16_b (const unsigned char ptr[2])
{
const unsigned char *aptr =
(const unsigned char *) __builtin_assume_aligned (ptr, 2);
#if WORDS_BIGENDIAN
return ((uint16_t) aptr [0] << 8) | (uint16_t) aptr [1];
#else
return (uint16_t) aptr [0] | ((uint16_t) aptr [1] << 8);
#endif
}
#endif
#if defined _MSC_VER
// Use memcpy and __assume.
uint16_t
load16_c (const unsigned char ptr[2])
{
__assume (((uintptr_t) ptr & (2 - 1)) == 0);
uint16_t value;
memcpy (&value, ptr, 2);
return value;
}
// Use __assume, without memcpy.
uint16_t
load16_d (const unsigned char ptr[2])
{
__assume (((uintptr_t) ptr & (2 - 1)) == 0);
#if WORDS_BIGENDIAN
return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
#else
return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
#endif
}
#endif
// Use memcpy, without __builtin_assume_aligned or __assume.
uint16_t
load16_e (const unsigned char ptr[2])
{
uint16_t value;
memcpy (&value, ptr, 2);
return value;
}
// Use the code for the unaligned case.
uint16_t
load16_f (const unsigned char ptr[2])
{
#if WORDS_BIGENDIAN
return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
#else
return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
#endif
}
// Modern gcc:
// arm64 all good
// armelhf all good
// i686 all good
// powerpc all good
// powerpc64 all good
// powerpc64le all good
// s390x all good
// x86_64 all good
// alpha a,b good; f medium; e worst
// arm a,b good; f medium; e worst
// hppa a,b good; f medium; e worst
// mips a,b good; f medium; e worst
// mips64 a,b good; f medium; e worst
// riscv64 a,b good; f medium; e worst
// sh4 a,b good; f medium; e worst
// sparc64 a,b good; f medium; e worst
// Older gcc, x86_64:
// 10..15 all good
// 4.7..9 a,e good; b,f medium
// MSVC c,e good; d,f medium