------- Comment #5 from changpeng dot fang at amd dot com 2010-08-24 22:13 ------- For the test case in comment #2, if we don't vectorize the loop, the unroll_factor is incorrectly determined as 1, and insns-to-prefetch ratio (4) will then prevent prefetching, and thus no performance regression.
If we vectorize the loop, the prefetch_mod will be smaller than the upper_bound, then the unroll_factor is determined as 4. At this time, insns-to-prefetch ratio is big enough to allow prefetches. Thus (5%) regression for 482.sphinx3. This regression should have occurred for no-tree-vectorize also if the unroll factor is correctly set. The actual problem is the unrolling itself. There is no regression if I just insert the prefetch and do not unroll the loop at all. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391