https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62080
Bug ID: 62080 Summary: Suboptimal code generation with eigen library Product: gcc Version: 4.8.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: beschindler at gmail dot com Created attachment 33281 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33281&action=edit Source code used to get the provided assembly I'm currently optimizing some code using the eigen library and I'm stumbling over an interesting problem. I have a function, which I wrote in two different ways (the attributes are there to provide some optimization barriers, dimEigen is a member variable of the containing class): void eigenClamp(Eigen::Vector4i& vec) __attribute__((noinline, noclone)) { vec = vec.array().min(dimEigen.array()).max(Eigen::Array4i::Zero()); } void eigenClamp2(Eigen::Vector4i& vec) __attribute__((noinline, noclone)) { vec = vec.array().min(dimEigen.array()); vec = vec.array().max(Eigen::Array4i::Zero()); } I'm compiling this on a core i7 920 using -O2 -fno-exceptions -fno-rtti -std=c++11 -march=native The first function generates this assembly, which looks great: movdqu (%rsi), %xmm1 movdqu (%rdi), %xmm0 pminsd %xmm1, %xmm0 pxor %xmm1, %xmm1 pmaxsd %xmm1, %xmm0 movdqa %xmm0, (%rsi) The second version does this: movdqa (%rsi), %xmm0 pminsd (%rdi), %xmm0 movdqa %xmm0, (%rsi) <-- pxor %xmm0, %xmm0 movdqu (%rsi), %xmm1 <-- pmaxsd %xmm1, %xmm0 movdqa %xmm0, (%rsi) It seems, because there are two lines in the original source code, the result of the first expression is written to memory and then two instructions later, read back from memory. This makes this function almost 50% slower in what I can measure. As I find the latter code much easier to read as the former, it would be great if the same assembly would be generated. Also, I note that in the second version, the pminsd is executed directly from the memory source, while in the first version, it is read to a register and then pminsd is called. Thus, I'd love to see this code: movdqu (%rsi), %xmm1 pminsd (%rdi), %xmm1 pxor %xmm1, %xmm1 pmaxsd %xmm1, %xmm0 movdqa %xmm0, (%rsi) As a reference, I'm attaching the complete source code and the generated assembly