------- Comment #11 from yuri at tsoft dot com 2005-12-20 07:40 ------- Subject: Re: REGREGRESSION: SSE2 vectorized code is many times slower on 4.x.x than on 3.4.4
Now this huge runtime difference disappeared but now 4.0.2-generated code is always ~> 20% slower. Many memory accesses where they are not needed at all and did not exist for 3.4.4. I tried -march=i686 and -march=k8, both are slower than 3.4.4. Do I also have to recompile gcc with some special options? Yuri pinskia at gcc dot gnu dot org wrote: >------- Comment #10 from pinskia at gcc dot gnu dot org 2005-12-20 06:55 >------- >Oh, I looked a little more and yes it depends on the arch you are building for >but only for 4.x. > >Since you are using SSE, you should add also -march=i686 or -march=k8 so that >the code is also tuned for the processor you are using. > >Anyways the problem with i386 with 4.0 is really just PR 14295. > > > > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500