https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87106
Bug ID: 87106 Summary: Group move and destruction of the source, where possible, for speed Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- Just a random testcase so I can give numbers, I don't claim this is a good testcase at all #include <string> #include <vector> __attribute__((flatten)) void f(){ int n = 1024*1024; std::vector<std::string> v(n); v.resize(n+1); } int main(){ for(int i=0;i<256;++i) f(); } runs in about 2.4s now. In _M_default_append, we have a first loop that copies (moves) strings from old to new, and a second loop that destroys old. If I comment out the destroying loop (not something we should do in general, this is just for the numbers), the running time goes down to 2.0s. If I replace the 2 loops with a single loop that does both move and destroy, the running time is now 1.6s. Move+destroy (aka destructive move, relocation, etc) are 2 operations that go well together and are not unlikely to simplify. Ideally the compiler would merge the 2 loops (loop fusion) for us, but it doesn't. Doing the operations in this order is only valid here because std::string can be moved+destroyed nothrow. I think it would be nice to introduce a special case for nothrow-relocatable types in several functions for std::vector (_M_default_append is just one among several, and probably not the most important one). If that makes the code simpler, we could use if constexpr and limit the optimization to recent standards. If one of the relocation papers ever makes it through the committee, it will likely require this optimization (or at least make it an important QoI point). There are probably places outside of vector that could also benefit, but vector looks like a good starting point.