[Bug other/51041] New: g++ strange optimisation behaviour
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51041 Bug #: 51041 Summary: g++ strange optimisation behaviour Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other AssignedTo: unassig...@gcc.gnu.org ReportedBy: fb.programm...@gmail.com The attached code repeatedly executes a vector * vector product to test the performance of the system. Compiled with g++ -Wall -O2 file.cpp it results in a performance of about 1.7 Gflops on an Intel i5-750, ie the output is adding: 0.059 s, 1.695 GFlops, sum=0.00 However, when adding another printf (remove the comment in front of the last printf) the performance deteriorates strongly (same compiler options): adding: 0.195 s, 0.512 GFlops, sum=0.00 sum=0.00 It seems the last printf confuses the compiler optimisation completely, although it shouldn't make a difference at all, as the same variable is already printed a few lines above. This is worrying as it seems the compiler fails to fully optimise the code under odd circumstances. I've used compiler version 4.6.2 as well as 4.4.1 which is the default compiler on the system. $ gcc-4.6.2 --version gcc-4.6.2 (GCC) 4.6.2 $ gcc --version gcc (SUSE Linux) 4.4.1 [gcc-4_4-branch revision 150839] $ uname -a Linux localhost 2.6.31.14-0.8-desktop #1 SMP PREEMPT 2011-04-06 18:09:24 +0200 x86_64 x86_64 x86_64 GNU/Linux
[Bug other/51041] g++ strange optimisation behaviour
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51041 --- Comment #1 from fb.programming at gmail dot com 2011-11-08 22:20:53 UTC --- Created attachment 25761 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25761 performance test doing vec*vec calc
[Bug tree-optimization/51499] New: vectorizer missing simple case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499 Bug #: 51499 Summary: vectorizer missing simple case Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: fb.programm...@gmail.com The sse vectorizer seems to miss one of the simplest cases: #include #include double loop(double a, size_t n){ // initialise differently so compiler doesn't simplify double sum1=0.1, sum2=0.2, sum3=0.3, sum4=0.4, sum5=0.5, sum6=0.6; for(size_t i=0; i
[Bug tree-optimization/51499] vectorizer missing simple case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499 --- Comment #2 from fb.programming at gmail dot com 2011-12-11 08:33:40 UTC --- (In reply to comment #1) g++-4.6.2 -S -Wall -O3 -ftree-vectorize -ftree-vectorizer-verbose=2 \ -ffast-math -fno-vect-cost-model gives me exactly the same assembly code as above (which I'm surprised a bit as -funsafe-math-optimizations might as well have eliminated the loop completely). The optimal assembly, however, I would expect to be something like: .L3: addq$1, %rax addpd%xmm0, %xmm3 cmpq%rdi, %rax addpd%xmm0, %xmm2 addpd%xmm0, %xmm1 jne.L3 Where the vector (sum1,sum2) is stored in xmm1, (sum3,sum4) stored in xmm2, etc and (a,a) stored in xmm0. This speeds it up by a factor of 2 and is completely equivalent to the scalar case so I don't see why -ffast-math (which implies -funsafe-math-optimizations) should be necessary in this case, either.
[Bug tree-optimization/51499] vectorizer missing simple case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499 --- Comment #4 from fb.programming at gmail dot com 2011-12-11 11:52:30 UTC --- Looks like there has been some great progress in gcc 4.7! Still I think it behaves slightly buggy. (1) In this case it should work without -funsafe-math-optimizations but it doesn't. gcc 4.7 requires -fno-signed-zeros -fno-trapping-math -fassociative-math to make it work. (2) The prediction: 7: not vectorized: vectorization not profitable. is just wrong. Forcing it with -fno-vect-cost-model shows it speeds up by factor of 2. (3) If I change all double's into float's in the code above it seems to work without forcing it (-fno-vect-cost-model): g++-4.7 -S -Wall -O2 -ftree-vectorize -ftree-vectorizer-verbose=2 \ -funsafe-math-optimizations test.cpp Analyzing loop at test.cpp:7 Vectorizing loop at test.cpp:7 7: vectorizing stmts using SLP. 7: LOOP VECTORIZED. test.cpp:4: note: vectorized 1 loops in function. However, it hasn't vectorized it at all as the assembly shows: .L11: addq$1, %rax addss%xmm0, %xmm3 cmpq%rax, %rdi addss%xmm0, %xmm4 addss%xmm0, %xmm7 addss%xmm0, %xmm6 addss%xmm0, %xmm5 addss%xmm0, %xmm1 ja.L11
[Bug tree-optimization/51499] vectorizer missing simple case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499 --- Comment #7 from fb.programming at gmail dot com 2011-12-11 14:55:13 UTC --- (In reply to comment #5) > > (3) If I change all double's into float's in the code above it seems to > I think you are looking at the scalar epilogue. The number of iterations is > unknown, so we need an epilogue loop for the case that number of iterations is > not a multiple of 4. Yes you're right. Sorry about that, my mistake. > > (1) In this case it should work without -funsafe-math-optimizations but > > it doesn't. gcc 4.7 requires -fno-signed-zeros -fno-trapping-math > >-fassociative-math to make it work. > > > > It's reduction, when we vectorize we change the order of computation. In order > to be able to do that for floating point we need flag_associative_math. In some cases it might be necessary but not here: sum1+=a; sum2+=a; gives exactly the same result as (sum1, sum2) += (a, a); Lets take a more applied example, say calculating the sum of 1/i: double harmon(int n) { double sum=0.0; for(int i=1; i
[Bug tree-optimization/51499] vectorizer missing simple case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499 --- Comment #13 from fb.programming at gmail dot com 2011-12-12 14:20:58 UTC --- (In reply to comment #9) > So, you are suggesting to remove the need in flag_associative_math for fp for > cases when a reduction computation is already unrolled by the vectorization > factor. Sounds reasonable to me. Yes I think that's it, basically only require flag_associative_math if the order of summation or products is changed by the vectorizer. That is quite important I think, as most of the time -ffast-math / -funsafe-math-optimizations / -fassociative-math might not be acceptable for many projects. However, I don't fully understand Richard Guenther's example. Yes his example requires -fassociative-math to be vectorized, however, my example would translate to something like sum1 += a[i]; sum2 += a[i+1]; and now it doesn't matter if it's executed this way or the other way around sum2 += a[i+1]; sum1 += a[i]; Second issue is just to double check the profitability calculation as it wrongly decided: 7: not vectorized: vectorization not profitable.