Den ons 26 feb. 2025 kl 14:30 skrev Stefan Sperling <s...@apache.org>:

> On Wed, Feb 26, 2025 at 01:24:22PM +0100, Daniel Sahlberg wrote:
> > Looking at the code, I'm assuming that if SVN_UNALIGNED_ACCESS_IS_OK was
> > set to 0 (so the char by char loop after #endif is used instead), the
> code
> > would run just fine. Can you confirm that?
> >
> > SVN_UNALIGNED_ACCESS_IS_OK is defined as follows - we'd probably have to
> > change that for GCC15.
>
> Unaligned access is undefined behaviour (UB). Compilers are now taking
> more liberties with undefined behaviour than they used to. So we need
> to be very careful in our assumptions about UB.
>
> Our code which is guarded by SVN_UNALIGNED_ACCESS_IS_OK might become
> unsafe when built with newer compilers on any platform. This code's
> behaviour is no longer under our control, but the compiler's s control.
> svn_eol__find_eol_start() might crash, become a no-op, or have unknown
> side-effects resulting in annoyances or even CVEs.
>
> At the very least, we should have this macro off by default and require
> it to be enabled manually. For best safety, the code should be removed.
> I understand that removing this code will incur a performance cost.
> The code was backed up by elaborate performance testing when it was
> written years ago.
>
> But what modern compilers make of UB will likely keep changing for the
> worse (or better, depending on which side of the UB-fence you are on),
> violating assumptions about platform behaviour in the macro definition:
>
> > #ifndef SVN_UNALIGNED_ACCESS_IS_OK
> > # if defined(_M_IX86) || defined(i386) \
> >      || defined(_M_X64) || defined(__x86_64) \
> >      || defined(__powerpc__) || defined(__ppc__)
> > #  define SVN_UNALIGNED_ACCESS_IS_OK 1
> > # else
> > #  define SVN_UNALIGNED_ACCESS_IS_OK 0
> > # endif
> > #endif
>
> I have seen contrived programs being translated into nothing on x86 when
> built with a relatively recent version of clang (16) at optimization levels
> of -O2 and up. For example, the following program will literally have its
> entire main() function elided:
>
> [[[
> #include <float.h>
> int main(void) {
>   double v = DBL_MAX;
>   int v2 = v;
>   if (v2 != v) return 0;
>   return 1;
> }
> ]]]
> (courtesy of https://bsd.network/@kristapsdz/113870782376321887)
>
> On OpenBSD/amd64, with clang -O2 this program compiles to 3 instructions,
> an empty function. The resulting executable just crashes when run.
> [[[
> (gdb) disassemble main
>    0x0000000000001950 <+0>:     endbr64
>    0x0000000000001954 <+4>:     push   %rbp
>    0x0000000000001955 <+5>:     mov    %rsp,%rbp
> ]]]
>
> At -01 there are 4 lines (still crashes). Without optimization (-O0)
> there are 38 lines of assembly instead of 3, and the program exits
> with status 0.
>
> Granted, what I have seen are contrived cases. But they resulted
> from someone looking into problems while doing development in C.
> As a C developer, this scares me enough to avoid UB at all costs
> because I cannot trust the compiler to warn about my mistakes
> which introduce UB. Rather, the compiler might decide to elide
> the code and change the intended meaning of the program.
>

Thanks for the detailed explanation. It seems really ugly.

Would add a separate for-loop before "one machine word at a time",
iterating until buf is aligned (maximum sizeof(void*)-1 times) help here?

Any idea of the performance gain on SVN_UNALIGNED_ACCESS_IS_OK on modern
hardware/modern compilers?

Cheers,
Daniel

Reply via email to