http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57962

            Bug ID: 57962
           Summary: Missed Optimization for Superword Level Parallelism
           Product: gcc
           Version: 4.7.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: freddie at witherden dot org

Created attachment 30541
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30541&action=edit
Sample code.

GCC 4.7.3 and 4.8.1 both miss an optimization when compiling the attached test
case using:

  gcc -Ofast -march=corei7-avx test.c -S

By loading the components of ul and fl into the bottom half of an xmm register
and ur and fr into the corresponding top half it is possible to compute
disf_inv_impl(ul, fl) and disf_inv_impl(ur, fr) in a single hit.  Horizontal
instructions can then be used to add the various fl and fr components together.

Reply via email to