On Wed, Feb 26, 2025 at 01:24:22PM +0100, Daniel Sahlberg wrote:
> Looking at the code, I'm assuming that if SVN_UNALIGNED_ACCESS_IS_OK was
> set to 0 (so the char by char loop after #endif is used instead), the code
> would run just fine. Can you confirm that?
> 
> SVN_UNALIGNED_ACCESS_IS_OK is defined as follows - we'd probably have to
> change that for GCC15.

Unaligned access is undefined behaviour (UB). Compilers are now taking
more liberties with undefined behaviour than they used to. So we need
to be very careful in our assumptions about UB.

Our code which is guarded by SVN_UNALIGNED_ACCESS_IS_OK might become
unsafe when built with newer compilers on any platform. This code's
behaviour is no longer under our control, but the compiler's s control.
svn_eol__find_eol_start() might crash, become a no-op, or have unknown
side-effects resulting in annoyances or even CVEs.

At the very least, we should have this macro off by default and require
it to be enabled manually. For best safety, the code should be removed.
I understand that removing this code will incur a performance cost.
The code was backed up by elaborate performance testing when it was
written years ago.

But what modern compilers make of UB will likely keep changing for the
worse (or better, depending on which side of the UB-fence you are on),
violating assumptions about platform behaviour in the macro definition:

> #ifndef SVN_UNALIGNED_ACCESS_IS_OK
> # if defined(_M_IX86) || defined(i386) \
>      || defined(_M_X64) || defined(__x86_64) \
>      || defined(__powerpc__) || defined(__ppc__)
> #  define SVN_UNALIGNED_ACCESS_IS_OK 1
> # else
> #  define SVN_UNALIGNED_ACCESS_IS_OK 0
> # endif
> #endif

I have seen contrived programs being translated into nothing on x86 when
built with a relatively recent version of clang (16) at optimization levels
of -O2 and up. For example, the following program will literally have its
entire main() function elided:

[[[
#include <float.h>
int main(void) {
  double v = DBL_MAX;
  int v2 = v;
  if (v2 != v) return 0;
  return 1;
}
]]]
(courtesy of https://bsd.network/@kristapsdz/113870782376321887)

On OpenBSD/amd64, with clang -O2 this program compiles to 3 instructions,
an empty function. The resulting executable just crashes when run.
[[[
(gdb) disassemble main
   0x0000000000001950 <+0>:     endbr64
   0x0000000000001954 <+4>:     push   %rbp
   0x0000000000001955 <+5>:     mov    %rsp,%rbp
]]]

At -01 there are 4 lines (still crashes). Without optimization (-O0)
there are 38 lines of assembly instead of 3, and the program exits
with status 0.

Granted, what I have seen are contrived cases. But they resulted
from someone looking into problems while doing development in C.
As a C developer, this scares me enough to avoid UB at all costs
because I cannot trust the compiler to warn about my mistakes
which introduce UB. Rather, the compiler might decide to elide
the code and change the intended meaning of the program.

Reply via email to