http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52459
Bug #: 52459
Summary: [x86] loop vectorization performance very bad (worse
than -O0) when using sse4.2 popcnt
Classification: Unclassified
Product: gcc
Version: 4.6.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52459
--- Comment #1 from M8R-ynb11d at mailinator dot com 2012-03-02 07:11:47 UTC ---
Similar (but much slower) results when not using SSE and using the libgcc
library version of __builtin_popcount:
-O0: 22.55 secs
-O1: 20.57 secs
-O2: 22.48 secs
-Os
: 6.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: lto
Assignee: unassigned at gcc dot gnu.org
Reporter: M8R-ynb11d at mailinator dot com
Target Milestone: ---
Created attachment 40766
--> https://gcc.gnu.org/bugzi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79587
--- Comment #1 from M8R-ynb11d at mailinator dot com ---
Created attachment 40767
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40767&action=edit
profile data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62642
--- Comment #5 from M8R-ynb11d at mailinator dot com ---
I originally put the barriers there in a futile attempt to work around the bug.
Can anyone tell me whether I actually need them, or whether the intrinsic
carries with it an implicit built
Assignee: unassigned at gcc dot gnu.org
Reporter: M8R-ynb11d at mailinator dot com
given:
unsigned long long measure(void (*func)(void))
{
unsigned long long before = __builtin_ia32_rdtsc();
asm volatile("" ::: "memory");
func();
asm vol
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: M8R-ynb11d at mailinator dot com
struct foo {
int &ref;
foo(int &i) : ref(i) {}
};
foo make_foo()
{
int x = 42;
return foo(x);
}
int func()
{