Hi,
I am looking at a performance regression in our code. A big loop produces
and uses a lot of temporary variables inside the loop body. The problem
appears that IVOPTS pass creates even more induction variables (from original
2 to 27). It causes a lot of register spilling later and performance
take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does call 
estimate_reg_pressure_cost function to take # of registers into 
consideration. The second parameter passed as data->regs_used is supposed
to represent old register usage before IVOPTS. 

  return size + estimate_reg_pressure_cost (size, data->regs_used, data->speed,
                                            data->body_includes_call);

In this case, it is mere 2 by following calculation. Essentially, it only counts
all loop invariant registers, ignoring all registers produced/used inside the 
loop.

  n = 0;
  for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi))
    {
      phi = gsi_stmt (psi);
      op = PHI_RESULT (phi);

      if (virtual_operand_p (op))
        continue;

      if (get_iv (data, op))
        continue;

      n++;
    }

  EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, j, bi)
    {
      struct version_info *info = ver_info (data, j);

      if (info->inv_id && info->has_nonlin_use)
        n++;
    }

  data->regs_used = n;

I believe how regs_used is calculated is seriously flawed,
or estimate_reg_pressure_cost is problematic if n_old is
only supposed to be loop invariant registers. Either way,
it affects how IVOPTS makes decision and could result in
worse code. What do you think? Any idea on how to improve
this? 


Thanks,
Bingfeng

Reply via email to