------- Comment #15 from jamborm at gcc dot gnu dot org  2010-06-14 12:39 
-------
(In reply to comment #14)
> SSE performance is fine again, thanks a lot!
> 
> One more question, if that's OK...
> Depending on ARRSZ the testcase uses wildly varying amounts of CPU time; it's
> about half a second for ARRSZ=1024, but almost 10 seconds for ARRSZ=20 on my
> machine, which is extremely strange because the operation count is the same in
> both cases. I suspect that something weird is happening with respect to the
> cache and prefetching. Should I open another PR for this?
> 

The generated assembly is not different for the two cases, except that
there are much smaller offsets, of course.  This means that the lpic
and pre1 arrays are much closer to each other which may be something
the processor does not like.  I find this surprising but unless you
can think of a specific missed optimization opportunity (I can't), I
don't think it is a PR material.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423

Reply via email to