https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78113
--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> --- (looking at the first testcase) There are 2 things. One is the implementation strategy in libstdc++ vs boost vs others (I don't know what is best, it probably depends on the application). The other one is that gcc's inliner is very badly suited to this type of code, as we have been seeing for a while with std::any, std::function, etc. Even if I manually unroll the loop (the unroller is too late) and force as much inlining as possible, it would require cycling between inlining and FRE (or similar, anything that replaces a memory load with whatever was last stored there).