https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66002
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- gprof tells me Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 67.13 25.40 25.40 2097192 0.00 0.00 contextModel2() 18.08 32.24 6.84 18874735 0.00 0.00 ContextMap::mix1(Mixer&, int, int, int, int) 9.46 35.82 3.58 2097192 0.00 0.00 Mixer::p() 2.72 36.85 1.03 14680344 0.00 0.00 APM1::p(int, int, int) 0.53 37.05 0.20 2097192 0.00 0.00 dmcModel(Mixer&) probably not too interesting (inlining). I wonder if you can run clang++ with vectorization disabled?