------- Comment #15 from jamborm at gcc dot gnu dot org 2010-06-14 12:39 ------- (In reply to comment #14) > SSE performance is fine again, thanks a lot! > > One more question, if that's OK... > Depending on ARRSZ the testcase uses wildly varying amounts of CPU time; it's > about half a second for ARRSZ=1024, but almost 10 seconds for ARRSZ=20 on my > machine, which is extremely strange because the operation count is the same in > both cases. I suspect that something weird is happening with respect to the > cache and prefetching. Should I open another PR for this? >
The generated assembly is not different for the two cases, except that there are much smaller offsets, of course. This means that the lpic and pre1 arrays are much closer to each other which may be something the processor does not like. I find this surprising but unless you can think of a specific missed optimization opportunity (I can't), I don't think it is a PR material. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423