http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46785
Summary: Doesn't vectorize reduction x += y*y Product: gcc Version: 4.6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: rgue...@gcc.gnu.org CC: i...@gcc.gnu.org When looking at why GCC is so slow with the himeno benchmark in the usual Phoronix testing I noticed that we do not vectorize the reduction in float x[1024]; float test (void) { int i; float gosa = 0.0; for (i = 0; i < 1024; ++i) { float tem = x[i]; gosa += tem * tem; } return gosa; } because at analysis time we have D.3171_6 = __builtin_powf (tem_5, 2.0e+0); as the def for the addition which doesn't satisfy is_gimple_assign nor any of the vinfo tests: $3 = {type = undef_vec_info_type, live = 0 '\000', in_pattern_p = 0 '\000', read_write_dep = 0 '\000', stmt = 0x7ffff7edc908, loop_vinfo = 0x18f77e0, vectype = 0x0, vectorized_stmt = 0x0, data_ref_info = 0x0, dr_base_address = 0x0, dr_init = 0x0, dr_offset = 0x0, dr_step = 0x0, dr_aligned_to = 0x0, related_stmt = 0x0, same_align_refs = 0x18cf7f0, def_type = vect_internal_def, slp_type = loop_vect, first_dr = 0x0, next_dr = 0x0, same_dr_stmt = 0x0, size = 0, store_count = 0, gap = 0, relevant = vect_unused_in_scope, cost = {outside_of_loop = 0, inside_of_loop = 0}, bb_vinfo = 0x0, vectorizable = 1 '\001'} As we want to allow internal defs we can also just let calls slip through here (so we vectorize reductions with veclib vectorized calls as well). Ira?