------- Comment #5 from changpeng dot fang at amd dot com  2010-08-24 22:13 
-------
For the test case in comment #2, if we don't vectorize the loop, the
unroll_factor is incorrectly determined as 1, and insns-to-prefetch ratio
(4) will then prevent prefetching, and thus no performance regression.

If we vectorize the loop, the prefetch_mod will be smaller than the 
upper_bound, then the unroll_factor is determined as 4. At this time, 
insns-to-prefetch ratio is big enough to allow prefetches. Thus  (5%)
regression for 482.sphinx3.

This regression should have occurred for no-tree-vectorize also if 
the unroll factor is correctly set. The actual problem is 
the unrolling itself. There is no regression if I just insert
the prefetch and do not unroll the loop at all.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391

Reply via email to