Hi again, Vladimir, I am pleased to report some performance improvements after altering ira-costs.c. A key benchmark for us has improved by 5%.
Specifically, in record_reg_classes(), after the alt_cost has been calculated and it will be applied to pp->mem_cost and pp->cost[k], I check whether this particular operand wanted one of our BOTTOM_REGS (r0-r15) and I further increase the pp->mem_cost by an arbitrary amount and also increase pp->cost[k] by an arbitrary amount if k does not represent the BOTTOM_REGS class. My aim here is to nudge IRA in the right direction for operands that just want BOTTOM_REGS. After experimenting with different values for my "arbitrary amounts", I discovered some that successfully made IRA more likely to give BOTTOM_REGS to those instructions/operands that want BOTTOM_REGS, since any other regs and memory ended up with high enough costs for IRA to try and avoid using them. I have included a snippet from my version of record_reg_classes() below: ==================================== op_cost_add = alt_cost * frequency; /* Finally, update the costs with the information we've calculated about this alternative. */ for (i = 0; i < n_ops; i++) if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER) { struct costs *pp = op_costs[i], *qq = this_op_costs[i]; int scale = 1 + (recog_data.operand_type[i] == OP_INOUT); /* If this operand really wanted a BOTTOM_REG, add an extra cost onto memory to nudge IRA away from putting it in memory */ if (allocno_pref && allocno_pref[ALLOCNO_NUM(ira_curr_regno_allocno_map [REGNO (ops[i])])] == BOTTOM_REGS) { pp->mem_cost = MIN (pp->mem_cost, (qq->mem_cost + op_cost_add + (flag_ira_preferred_register_cost_memory * frequency)) * scale); } else { pp->mem_cost = MIN (pp->mem_cost, (qq->mem_cost + op_cost_add) * scale); } for (k = 0; k < cost_classes_num; k++) { /* If this operand really wanted a BOTTOM_REG, add an extra cost onto any register class that isn't BOTTOM_REGS to nudge IRA away from putting it in a hard register of that class */ if (allocno_pref && allocno_pref[ALLOCNO_NUM(ira_curr_regno_allocno_map [REGNO (ops[i])])] == BOTTOM_REGS) { switch(cost_classes[k]) { case BOTTOM_REGS: op_cost_add = alt_cost * frequency; break; case TOP_CREGS: case C_REGS: op_cost_add = (alt_cost + flag_ira_preferred_register_cost_register) * frequency; break; default: op_cost_add = alt_cost * frequency; break; } } pp->cost[k] = MIN (pp->cost[k], (qq->cost[k] + op_cost_add) * scale); } } ==================================== So far, I have found the best value for flag_ira_preferred_register_cost_memory to be 20 and the best value for flag_ira_preferred_register_cost_register to be 6. I appreciate that these numbers do not really correlate with the other cost units but they were the ones that made the impact. In terms of coloring algorithms, we are still seeing better performance with the priority algorithm on our benchmarks, but the cost adjustments above improved both priority algorithm and the CB algorithm, with ira-region=mixed and ira-region=one. If you have any thoughts you'd like to share then I'd definitely be interested, but this post is mainly because you said in a previous email that you wanted to hear my suggestions :) Best regards, Ian >Ian Bolton wrote: >> I hope you could also make some suggestions as to how I might >> help IRA work well with our instructions that can only use a >> subset of the register bank. >> >I forgot to write: thanks, it would be interesting for me to see >your suggestions :)