On Wed, Feb 26, 2025 at 01:24:22PM +0100, Daniel Sahlberg wrote: > Looking at the code, I'm assuming that if SVN_UNALIGNED_ACCESS_IS_OK was > set to 0 (so the char by char loop after #endif is used instead), the code > would run just fine. Can you confirm that? > > SVN_UNALIGNED_ACCESS_IS_OK is defined as follows - we'd probably have to > change that for GCC15.
Unaligned access is undefined behaviour (UB). Compilers are now taking more liberties with undefined behaviour than they used to. So we need to be very careful in our assumptions about UB. Our code which is guarded by SVN_UNALIGNED_ACCESS_IS_OK might become unsafe when built with newer compilers on any platform. This code's behaviour is no longer under our control, but the compiler's s control. svn_eol__find_eol_start() might crash, become a no-op, or have unknown side-effects resulting in annoyances or even CVEs. At the very least, we should have this macro off by default and require it to be enabled manually. For best safety, the code should be removed. I understand that removing this code will incur a performance cost. The code was backed up by elaborate performance testing when it was written years ago. But what modern compilers make of UB will likely keep changing for the worse (or better, depending on which side of the UB-fence you are on), violating assumptions about platform behaviour in the macro definition: > #ifndef SVN_UNALIGNED_ACCESS_IS_OK > # if defined(_M_IX86) || defined(i386) \ > || defined(_M_X64) || defined(__x86_64) \ > || defined(__powerpc__) || defined(__ppc__) > # define SVN_UNALIGNED_ACCESS_IS_OK 1 > # else > # define SVN_UNALIGNED_ACCESS_IS_OK 0 > # endif > #endif I have seen contrived programs being translated into nothing on x86 when built with a relatively recent version of clang (16) at optimization levels of -O2 and up. For example, the following program will literally have its entire main() function elided: [[[ #include <float.h> int main(void) { double v = DBL_MAX; int v2 = v; if (v2 != v) return 0; return 1; } ]]] (courtesy of https://bsd.network/@kristapsdz/113870782376321887) On OpenBSD/amd64, with clang -O2 this program compiles to 3 instructions, an empty function. The resulting executable just crashes when run. [[[ (gdb) disassemble main 0x0000000000001950 <+0>: endbr64 0x0000000000001954 <+4>: push %rbp 0x0000000000001955 <+5>: mov %rsp,%rbp ]]] At -01 there are 4 lines (still crashes). Without optimization (-O0) there are 38 lines of assembly instead of 3, and the program exits with status 0. Granted, what I have seen are contrived cases. But they resulted from someone looking into problems while doing development in C. As a C developer, this scares me enough to avoid UB at all costs because I cannot trust the compiler to warn about my mistakes which introduce UB. Rather, the compiler might decide to elide the code and change the intended meaning of the program.