Den ons 26 feb. 2025 kl 14:30 skrev Stefan Sperling <s...@apache.org>:
> On Wed, Feb 26, 2025 at 01:24:22PM +0100, Daniel Sahlberg wrote: > > Looking at the code, I'm assuming that if SVN_UNALIGNED_ACCESS_IS_OK was > > set to 0 (so the char by char loop after #endif is used instead), the > code > > would run just fine. Can you confirm that? > > > > SVN_UNALIGNED_ACCESS_IS_OK is defined as follows - we'd probably have to > > change that for GCC15. > > Unaligned access is undefined behaviour (UB). Compilers are now taking > more liberties with undefined behaviour than they used to. So we need > to be very careful in our assumptions about UB. > > Our code which is guarded by SVN_UNALIGNED_ACCESS_IS_OK might become > unsafe when built with newer compilers on any platform. This code's > behaviour is no longer under our control, but the compiler's s control. > svn_eol__find_eol_start() might crash, become a no-op, or have unknown > side-effects resulting in annoyances or even CVEs. > > At the very least, we should have this macro off by default and require > it to be enabled manually. For best safety, the code should be removed. > I understand that removing this code will incur a performance cost. > The code was backed up by elaborate performance testing when it was > written years ago. > > But what modern compilers make of UB will likely keep changing for the > worse (or better, depending on which side of the UB-fence you are on), > violating assumptions about platform behaviour in the macro definition: > > > #ifndef SVN_UNALIGNED_ACCESS_IS_OK > > # if defined(_M_IX86) || defined(i386) \ > > || defined(_M_X64) || defined(__x86_64) \ > > || defined(__powerpc__) || defined(__ppc__) > > # define SVN_UNALIGNED_ACCESS_IS_OK 1 > > # else > > # define SVN_UNALIGNED_ACCESS_IS_OK 0 > > # endif > > #endif > > I have seen contrived programs being translated into nothing on x86 when > built with a relatively recent version of clang (16) at optimization levels > of -O2 and up. For example, the following program will literally have its > entire main() function elided: > > [[[ > #include <float.h> > int main(void) { > double v = DBL_MAX; > int v2 = v; > if (v2 != v) return 0; > return 1; > } > ]]] > (courtesy of https://bsd.network/@kristapsdz/113870782376321887) > > On OpenBSD/amd64, with clang -O2 this program compiles to 3 instructions, > an empty function. The resulting executable just crashes when run. > [[[ > (gdb) disassemble main > 0x0000000000001950 <+0>: endbr64 > 0x0000000000001954 <+4>: push %rbp > 0x0000000000001955 <+5>: mov %rsp,%rbp > ]]] > > At -01 there are 4 lines (still crashes). Without optimization (-O0) > there are 38 lines of assembly instead of 3, and the program exits > with status 0. > > Granted, what I have seen are contrived cases. But they resulted > from someone looking into problems while doing development in C. > As a C developer, this scares me enough to avoid UB at all costs > because I cannot trust the compiler to warn about my mistakes > which introduce UB. Rather, the compiler might decide to elide > the code and change the intended meaning of the program. > Thanks for the detailed explanation. It seems really ugly. Would add a separate for-loop before "one machine word at a time", iterating until buf is aligned (maximum sizeof(void*)-1 times) help here? Any idea of the performance gain on SVN_UNALIGNED_ACCESS_IS_OK on modern hardware/modern compilers? Cheers, Daniel