https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108140
Bug ID: 108140 Summary: tzcnt gives different result in debug vs release Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: levo.delellis at gmail dot com Target Milestone: --- This might be more than one bug and I gotten the compiler to crash. Tested on apple ventura with an M2 but it may happen on ARMv8 linux. This slightly different test also fails with O2 on my mac https://godbolt.org/z/xv883jMb9 gcc docs says 0 might be undefined, I understand that > Built-in Function: int __builtin_ctz (unsigned int x) > Returns the number of trailing 0-bits in x, starting at the least > significant bit position. If x is 0, the result is undefined. >From my understanding armv8 doesn't have count trailing zero, it implements it using rbits (reverse bits) and clz. clz says when you give it a 64bit register it'll return 64 on 0. https://developer.arm.com/documentation/ddi0596/2020-12/Base-Instructions/CLZ--Count-Leading-Zeros- Now here's the problem. I would think __builtin_ctz would be those two instructions. So I tried the below, compiled and ran using `gcc -Wall -Wextra test.c && ./a.out` and saw that it worked as expected. However mistake was stopping there. Using `gcc -O2 -Wall -Wextra test.c && ./a.out` gets 456 instead, no warnings or anything. Looking at the assembly it appears the check has been optimized out and 456 is used. Looking at the "ARM C Language Extensions Architecture Specification" it suggested including arm_acle.h. So I replaced the line below with the following line and still got the incorrect result unsigned long long tz = __clz((unsigned long long)__rbit(input)); I'm not sure if this is another bug but this crashes with -O2 https://godbolt.org/z/xv883jMb9 it also doesn't give me the result I expected. rbit appears to give me 32 no matter what I write. Doc says it should give 64 https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/RBIT--vector---Reverse-Bit-order--vector--?lang=en Anyway I would like to be warned against these problems somehow. Homebrew on mac doesn't seem to have undefined behavior sanitizer (although I'm new to mac and may have set it up incorrectly). The ubsan would be great to warn against this. Alternative a flag such as -Wprefer-intrinsic could help, when either the built ins don't match the CPU behavior. >From what I can tell __builtin_ctzll doesn't seem to return 32, it seems like it does return 64, when you compare the variable the optimizer seems to think it will never be greater than 32 which was a problem in my code because I was using bits >= 60 so I can't simply do >= 32. #include<stdio.h> int main(int argc, char *argv[]) { unsigned long long input = argc-1; unsigned long long v = __builtin_ctzll(input); printf("%d %d\n", argc, v >= 64 ? 123 : 456); }