https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435
--- Comment #5 from Yann Collet <yann.collet.73 at gmail dot com> --- Complementary information : -Winline : does not output anything (is that normal ?) -fdump-ipa-inline : produce several large files, the interesting one being 1.5 MB long. That's a huge dump to analyze. Nonetheless, I had a deeper look directly at the function which speed is affected. Looking at both slow and fast versions, I could spot *no difference* regarding inline decisions. From what I can tell, the dump file seems strictly identical. (note : there could be some differences somewhere else that I did not spotted). Since then, I've also been suggested that maybe this effect could related to something else, instruction cache line alignment.