In PR53346 we vectorize a simple memset loop very inefficiently. But of course we should have detected this and transformed the loop into a memset! Seems like we only do that if the original loop does sth else than memset as well.
Fixed as follows. Bootstrap and regtest on x86_64-unknown-linux-gnu ongoing (I suppose that will really stress loop distribution now ;)) Richard. 2012-05-18 Richard Guenther <rguent...@suse.de> PR tree-optimization/53346 * tree-loop-distribution.c (ldist_gen): Make sure to apply builtin transform even when only a single partition with all reads/writes exists. * gcc.dg/tree-ssa/ldist-18.c: New testcase. Index: gcc/tree-loop-distribution.c =================================================================== *** gcc/tree-loop-distribution.c (revision 187650) --- gcc/tree-loop-distribution.c (working copy) *************** ldist_gen (struct loop *loop, struct gra *** 1131,1138 **** BITMAP_FREE (processed); nbp = VEC_length (bitmap, partitions); ! if (nbp <= 1 ! || partition_contains_all_rw (rdg, partitions)) goto ldist_done; if (dump_file && (dump_flags & TDF_DETAILS)) --- 1131,1141 ---- BITMAP_FREE (processed); nbp = VEC_length (bitmap, partitions); ! if (nbp == 0 ! || (nbp == 1 ! && !can_generate_builtin (rdg, VEC_index (bitmap, partitions, 0))) ! || (nbp > 1 ! && partition_contains_all_rw (rdg, partitions))) goto ldist_done; if (dump_file && (dump_flags & TDF_DETAILS)) Index: gcc/testsuite/gcc.dg/tree-ssa/ldist-18.c =================================================================== *** gcc/testsuite/gcc.dg/tree-ssa/ldist-18.c (revision 0) --- gcc/testsuite/gcc.dg/tree-ssa/ldist-18.c (revision 0) *************** *** 0 **** --- 1,12 ---- + /* { dg-do compile } */ + /* { dg-options "-O2 -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */ + + void foo (int *p, int n) + { + int i; + for (i = 0; i < n; ++i) + p[i] = 0; + } + + /* { dg-final { scan-tree-dump "generated memset zero" "ldist" } } */ + /* { dg-final { cleanup-tree-dump "ldist" } } */