http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52034
Bug #: 52034 Summary: __builtin_copysign optimization suboptimal Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: drepper....@gmail.com The most trivial __builtin_copysign optimization is not optimal: double f(double a, double b) { return __builtin_copysign(a,b); } With gcc 4.6.2 this gets compiled to movapd %xmm1, %xmm2 andpd .LC0(%rip), %xmm0 andpd .LC1(%rip), %xmm2 orpd %xmm2, %xmm0 ret There is no reason for %xmm1 to be duplicated to %xmm2. This is sufficient: andpd .LC0(%rip), %xmm0 andpd .LC1(%rip), %xmm1 orpd %xmm1, %xmm0 ret The same happens with more complicated code sequences.