https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90434
Bug ID: 90434 Summary: [regression from 8.x] Incorrect code generation for __builtin_strcmp with LTO and freestanding binary Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Hello, Apologies for this being a complicated bug report, but I think I may need some help tracking things down further. Nevertheless, I am moderately confident it is a regression in GCC 9.x, because 8.x works correctly. I have two versions of GCC built today from the tip of stable branches. $ PATH=/local/bin/gcc-8.x/bin:$PATH gcc --version gcc (GCC) 8.3.1 20190511 ... $ PATH=/local/bin/gcc-9.x/bin:$PATH gcc --version gcc (GCC) 9.1.1 20190511 The 8.x version works correctly, while the 9.x version results in incorrect code generation, which leads to a crash when executed. Both versions work fine when LTO is not in use. The environment is http://xenbits.xen.org/docs/xtf/ which is a set of microkerenls. As such, they are freestanding builds and linked to a single final executable, but are standard x86 32bit and 64bit ELF binaries. Because -ffreestanding disables a whole swathe of builtin optimisations, the headers locally reinsert the optimisations in the following manner: int strcmp(const char *s1, const char *s2); #define strcmp(s1, s2) __builtin_strcmp(s1, s2) An implementation of strcmp() is also provided for when a real call is emitted by the builtin: int (strcmp)(const char *_s1, const char *_s2) { unsigned char s1, s2; do { s1 = *_s1++; s2 = *_s2++; } while ( s1 && s1 == s2 ); return (s1 < s2) ? -1 : (s1 > s2); } The problematic test in question calls strcmp() from its main function. With 8.x, the builtin is expanded inline, and (having no callers) the strcmp() function itself is omitted from the final binary. With 9.x, the builtin is not expanded inline, and a call to strcmp() is emitted. strcmp() itself crashes when it first dereferences the second parameter. First of all, I found that if I replace the above strcmp() implementation with this alternative (which I borrowed from the Linux Kernel): int (strcmp)(const char *_s1, const char *_s2) { unsigned char s1, s2; while ( 1 ) { s1 = *_s1++; s2 = *_s2++; if ( s1 != s2 ) return s1 < s2 ? -1 : 1; if ( !s1 ) break; } return 0; } then 9.x *does* expand the builtin inline, and the strcmp() function itself is omitted from the final binary. This binary also executes correctly. Therefore, something is clearly inspecting the content of my locally provided strcmp() function and deciding that it can't expand the builtin inline. However, I'm not sure what it is checking for, because AFAICT, the ABI of those two variations is identical. That on its own would only cause an efficiently problem, not a correctness problem. When inspecting the disassembly of the bad binary, I see that strcmp() has its parameters passed in registers rather than on the stack. AFAICT, the value of the _s1 parameter is always correct, while the value of _s2 seems to be less than 256 (possibly a single char?). The crash occurs because the deference of _s2 hits the zero page. As a result, it is clear that the strcmp() function has been compiled with a non-standard ABI. AFAICT, the calling code in main() does conform to the expected ABI, by passing the parameters on the stack. Therefore, I wonder whether the code actually emitted for strcmp() was supposed to be a specialised version (strcmp.isra.$foo perhaps, or something else), which actually got called with the original ABI. Anyway - that is as far as I've got debugging. Can anyone suggest further steps to help narrow this down?