https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728
Bug ID: 99728 Summary: code pessimization when using wrapper classes around SIMD types Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: mar...@mpa-garching.mpg.de Target Milestone: --- Created attachment 50456 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50456&action=edit test case I originally reported this at https://gcc.gnu.org/pipermail/gcc-help/2021-March/139976.html, but I'm now fairly confident that this warrants a PR. The test case needs to be processed on x86_64 with the command g++ -mfma -O3 -std=c++17 -ffast-math -S testcase.cc Code for two functions will be generated, and I would expect that the generated assembler for both should be identical. However, for the version using the wrapper class around __m256d, g++ does not seem to recognize the dead stores at the end of the loop and leaves them inside the loop body instead of moving them after the final jump instruction of the loop, which reduces performance considerably. clang++ generates nearly identical code for both functions and manages to remove the dead stores, so I think that g++ might be able to do better here and is not pessimizing the code due to some C++ intricacies.