Hello All, With the code given below, i expected the ppc compiler (e500mc v4.6.2) to generate 'memset' zero call for loop initialization (at '-O3'), but it generates a loop.
Case:1 int a[18], b[18]; foo () { int i; for (i=0; i < 18; i++) a[i] = 0; } Also based on the '-ftree-loop-distribute-patterns' flag, if the test case (taken from gcc doc) is as shown below, the compiler does generate 'memset' zero. Case:2 int a[18], b[18]; foo () { int i; for (i=0; i < 18; i++) { a[i] = 0; -------------(A) b[i] = a[i] + i; -------------(B) } } Here statements (A) and (B) are split in to two loops and for the 1st loop the compiler generates 'memset' zero call. Isn't the same optimization supposed to happen with case (1)? Also with case(2) statement (A), for loop iterations < 18, the compiler unrolls the loop and for iterations >= 18, 'memset' zero is generated. Looking at 'gcc/tree-loop-distribution.c' file, static int ldist_gen (struct loop *loop, struct graph *rdg, VEC (int, heap) *starting_vertices) { ... BITMAP_FREE (processed); nbp = VEC_length (bitmap, partitions); if (nbp <= 1 || partition_contains_all_rw (rdg, partitions)) goto ldist_done; ------------------------(Z) if (dump_file && (dump_flags & TDF_DETAILS)) dump_rdg_partitions (dump_file, partitions); FOR_EACH_VEC_ELT (bitmap, partitions, i, partition) if (!generate_code_for_partition (loop, partition, i < nbp - 1)) -------------------(Y) // code for generating built-in 'memset' is called from here. goto ldist_done; rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa); update_ssa (TODO_update_ssa_only_virtuals | TODO_update_ssa); ldist_done: BITMAP_FREE (remaining_stmts); ......... return nbp; } >From statement (Z), if the no of distributed loops is <=1 , then the code generating built-in function (Y) is not executed. Is it a good solution to update this conditional check for single loop (which is not split) also? or Is there any other place/pass where we can implement this. Regards, Rohit