[Bug middle-end/50713] SLP vs loop: code generated differs (SLP less efficient)

2012-12-01 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50713 --- Comment #9 from vincenzo Innocente 2012-12-01 17:49:00 UTC --- indeed. and now this other vectorizes also on corei7 ("yesterday" was ok only with AVX) float64x4_t cross_product(float64x4_t x, float64x4_t y) { // yz - zy, zx - xz, xy - yx,

[Bug middle-end/50713] SLP vs loop: code generated differs (SLP less efficient)

2012-12-01 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50713 --- Comment #8 from Marc Glisse 2012-12-01 16:54:08 UTC --- (In reply to comment #5) We seem to do better now. I see essentially the same code for the vector and loop versions. The main issue left is for dfma8*, copying the result to the

[Bug middle-end/50713] SLP vs loop: code generated differs (SLP less efficient)

2012-10-25 Thread vincenzo.innocente at cern dot ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50713 vincenzo Innocente changed: What|Removed |Added Component|tree-optimization |middle-end Versi