------- Comment #13 from changpeng dot fang at amd dot com 2010-06-30 00:23 ------- Here is the current status of this work: patch1: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02956.html patch2: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg03049.html On my system with -O3 zero_sized_1.f90 -fprefetch-loop-arrays -fno-unroll-loops --param max-completely-peeled-insns=2000:
original timing: 5m30s with patch1: 1m20s with patch1 + patch2: 1m03s without prefetch: 0m30s The timing with prefetch-loop-arrays is still doubled after the two patch compared to no-prefetch-loop-arrays. The extra 33s is mostly spent in dependence computation for loops. For this test case, prefetching is the only optimization that invokes "compute_all_dependences". I am not sure whether we should tolerate this timing increase with aggressive peeling and prefetching, or we should work on the cost reduction of dependence computation. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576