[Bug other/51041] New: g++ strange optimisation behaviour

2011-11-08 Thread fb.programming at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51041

 Bug #: 51041
   Summary: g++ strange optimisation behaviour
Classification: Unclassified
   Product: gcc
   Version: 4.6.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: fb.programm...@gmail.com


The attached code repeatedly executes a vector * vector product to test the
performance of the system. Compiled with

 g++ -Wall -O2 file.cpp

it results in a performance of about 1.7 Gflops on an Intel i5-750, ie
the output is

 adding:  0.059 s, 1.695 GFlops, sum=0.00

However, when adding another printf (remove the comment in front of the
last printf) the performance deteriorates strongly (same compiler
options):

 adding:  0.195 s, 0.512 GFlops, sum=0.00
 sum=0.00

It seems the last printf confuses the compiler optimisation completely,
although it shouldn't make a difference at all, as the same variable
is already printed a few lines above.

This is worrying as it seems the compiler fails to fully optimise the
code under odd circumstances. I've used compiler version 4.6.2 as well as
4.4.1 which is the default compiler on the system.


$ gcc-4.6.2 --version
gcc-4.6.2 (GCC) 4.6.2

$ gcc --version
gcc (SUSE Linux) 4.4.1 [gcc-4_4-branch revision 150839]

$ uname -a
Linux localhost 2.6.31.14-0.8-desktop #1 SMP PREEMPT 2011-04-06 18:09:24 +0200
x86_64 x86_64 x86_64 GNU/Linux


[Bug other/51041] g++ strange optimisation behaviour

2011-11-08 Thread fb.programming at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51041

--- Comment #1 from fb.programming at gmail dot com 2011-11-08 22:20:53 UTC ---
Created attachment 25761
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25761
performance test doing vec*vec calc


[Bug tree-optimization/51499] New: vectorizer missing simple case

2011-12-10 Thread fb.programming at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499

 Bug #: 51499
   Summary: vectorizer missing simple case
Classification: Unclassified
   Product: gcc
   Version: 4.6.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: fb.programm...@gmail.com


The sse vectorizer seems to miss one of the simplest cases:

#include 
#include 

double loop(double a, size_t n){
   // initialise differently so compiler doesn't simplify
   double sum1=0.1, sum2=0.2, sum3=0.3, sum4=0.4, sum5=0.5, sum6=0.6;
   for(size_t i=0; i

[Bug tree-optimization/51499] vectorizer missing simple case

2011-12-11 Thread fb.programming at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499

--- Comment #2 from fb.programming at gmail dot com 2011-12-11 08:33:40 UTC ---
(In reply to comment #1)

g++-4.6.2 -S -Wall -O3 -ftree-vectorize -ftree-vectorizer-verbose=2 \
  -ffast-math  -fno-vect-cost-model

gives me exactly the same assembly code as above (which I'm surprised
a bit as -funsafe-math-optimizations might as well have eliminated the
loop completely).

The optimal assembly, however, I would expect to be something like:

.L3:
addq$1, %rax
addpd%xmm0, %xmm3
cmpq%rdi, %rax
addpd%xmm0, %xmm2
addpd%xmm0, %xmm1
jne.L3

Where the vector (sum1,sum2) is stored in xmm1, (sum3,sum4) stored in
xmm2, etc and (a,a) stored in xmm0. This speeds it up by a factor of 2
and is completely equivalent to the scalar case so I don't see why
-ffast-math (which implies -funsafe-math-optimizations) should be
necessary in this case, either.


[Bug tree-optimization/51499] vectorizer missing simple case

2011-12-11 Thread fb.programming at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499

--- Comment #4 from fb.programming at gmail dot com 2011-12-11 11:52:30 UTC ---
Looks like there has been some great progress in gcc 4.7!

Still I think it behaves slightly buggy.

(1) In this case it should work without -funsafe-math-optimizations but
it doesn't. gcc 4.7 requires -fno-signed-zeros -fno-trapping-math
   -fassociative-math to make it work.

(2) The prediction:
   7: not vectorized: vectorization not profitable.
is just wrong. Forcing it with -fno-vect-cost-model shows it speeds up
by factor of 2.

(3) If I change all double's into float's in the code above it seems to
work without forcing it (-fno-vect-cost-model):


   g++-4.7 -S -Wall -O2  -ftree-vectorize -ftree-vectorizer-verbose=2 \
   -funsafe-math-optimizations test.cpp

   Analyzing loop at test.cpp:7


   Vectorizing loop at test.cpp:7

   7: vectorizing stmts using SLP.
   7: LOOP VECTORIZED.
   test.cpp:4: note: vectorized 1 loops in function.


However, it hasn't vectorized it at all as the assembly shows:

.L11:
addq$1, %rax
addss%xmm0, %xmm3
cmpq%rax, %rdi
addss%xmm0, %xmm4
addss%xmm0, %xmm7
addss%xmm0, %xmm6
addss%xmm0, %xmm5
addss%xmm0, %xmm1
ja.L11


[Bug tree-optimization/51499] vectorizer missing simple case

2011-12-11 Thread fb.programming at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499

--- Comment #7 from fb.programming at gmail dot com 2011-12-11 14:55:13 UTC ---
(In reply to comment #5)

> > (3) If I change all double's into float's in the code above it seems to

> I think you are looking at the scalar epilogue. The number of iterations is
> unknown, so we need an epilogue loop for the case that number of iterations is
> not a multiple of 4.

Yes you're right. Sorry about that, my mistake.


> > (1) In this case it should work without -funsafe-math-optimizations but
> > it doesn't. gcc 4.7 requires -fno-signed-zeros -fno-trapping-math
> >-fassociative-math to make it work.
> > 
> 
> It's reduction, when we vectorize we change the order of computation. In order
> to be able to do that for floating point we need flag_associative_math.

In some cases it might be necessary but not here:

 sum1+=a;
 sum2+=a;

gives exactly the same result as

 (sum1, sum2) += (a, a);

Lets take a more applied example, say calculating the sum of 1/i:

   double harmon(int n) {
  double sum=0.0;
  for(int i=1; i

[Bug tree-optimization/51499] vectorizer missing simple case

2011-12-12 Thread fb.programming at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499

--- Comment #13 from fb.programming at gmail dot com 2011-12-12 14:20:58 UTC ---
(In reply to comment #9)

> So, you are suggesting to remove the need in flag_associative_math for fp for
> cases when a reduction computation is already unrolled by the vectorization
> factor. Sounds reasonable to me.

Yes I think that's it, basically only require flag_associative_math if
the order of summation or products is changed by the vectorizer. That is
quite important I think, as most of the time
 -ffast-math / -funsafe-math-optimizations / -fassociative-math
might not be acceptable for many projects.

However, I don't fully understand Richard Guenther's example. Yes his
example requires -fassociative-math to be vectorized, however, my example
would translate to something like

  sum1 += a[i];
  sum2 += a[i+1];

and now it doesn't matter if it's executed this way or the other way
around

  sum2 += a[i+1];
  sum1 += a[i];

Second issue is just to double check the profitability calculation
as it wrongly decided:

  7: not vectorized: vectorization not profitable.