http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60702
arturomdn at gmail dot com changed:
What|Removed |Added
CC||arturomdn at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #16 from arturomdn at gmail dot com 2013-02-14 17:42:55 UTC ---
With -ftree-vectorize -fno-tree-loop-if-convert flags it generated this for the
loop in question:
.L39:
movq%rdi, %rdx
addq(%rsi,%rax,8
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #14 from arturomdn at gmail dot com 2013-02-14 17:30:54 UTC ---
I also did the experiment, with the same results... it got faster but not as
fast as the version with
conditional branch instead of conditional moves:
./by-ref
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #10 from arturomdn at gmail dot com 2013-02-14 16:43:23 UTC ---
Might be worth mentioning here what I said in the stackoverflow answer, that in
this particular case the entire conditional branch can be avoided because it is
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #9 from arturomdn at gmail dot com 2013-02-14 16:00:49 UTC ---
I found in the Intel optimization guide an example of this idiom of comparing
once and issuing two cmov back-to-back... so the problem isn't the two cmov,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #8 from arturomdn at gmail dot com 2013-02-14 15:53:15 UTC ---
It is possible (just a guess) that the extra compare is causing an interlock in
the processor since the first cmov is issued speculatively and the condition
won'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #3 from arturomdn at gmail dot com 2013-02-13 20:29:12 UTC ---
Intel Xeon X5570 @ 2.93GHz
(In reply to comment #2)
> Which target is this on? On some (most non x86 targets) conditional moves are
> faster than compa
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #1 from arturomdn at gmail dot com 2013-02-13 20:22:51 UTC ---
Created attachment 29443
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29443
Self contained source file with parameter x passed by reference (fast)
T
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
Bug #: 56309
Summary: -O3 optimizer generates conditional moves instead of
compare and branch resulting in almost 2x slower code
Classification: Unclassified
Product: gcc