Hi, I am looking at a performance regression in our code. A big loop produces and uses a lot of temporary variables inside the loop body. The problem appears that IVOPTS pass creates even more induction variables (from original 2 to 27). It causes a lot of register spilling later and performance take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does call estimate_reg_pressure_cost function to take # of registers into consideration. The second parameter passed as data->regs_used is supposed to represent old register usage before IVOPTS.
return size + estimate_reg_pressure_cost (size, data->regs_used, data->speed, data->body_includes_call); In this case, it is mere 2 by following calculation. Essentially, it only counts all loop invariant registers, ignoring all registers produced/used inside the loop. n = 0; for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi)) { phi = gsi_stmt (psi); op = PHI_RESULT (phi); if (virtual_operand_p (op)) continue; if (get_iv (data, op)) continue; n++; } EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, j, bi) { struct version_info *info = ver_info (data, j); if (info->inv_id && info->has_nonlin_use) n++; } data->regs_used = n; I believe how regs_used is calculated is seriously flawed, or estimate_reg_pressure_cost is problematic if n_old is only supposed to be loop invariant registers. Either way, it affects how IVOPTS makes decision and could result in worse code. What do you think? Any idea on how to improve this? Thanks, Bingfeng