http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57962
Bug ID: 57962 Summary: Missed Optimization for Superword Level Parallelism Product: gcc Version: 4.7.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Created attachment 30541 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30541&action=edit Sample code. GCC 4.7.3 and 4.8.1 both miss an optimization when compiling the attached test case using: gcc -Ofast -march=corei7-avx test.c -S By loading the components of ul and fl into the bottom half of an xmm register and ur and fr into the corresponding top half it is possible to compute disf_inv_impl(ul, fl) and disf_inv_impl(ur, fr) in a single hit. Horizontal instructions can then be used to add the various fl and fr components together.