Hello! > It is known that LEA splitting is one of the most critical problems > for Atom processors and changes try to improve it through: > 1. More aggressive Lea splitting – do not perform splitting if > only split cost exceeds AGU stall . > 2. Reordering splitting instructions to get better scheduling – > use the farthest defined register for SET instruction, then add > constant offset if any and finally generate add instruction.This gives > +0.5% speedup in geomean for eembc2.0 suite on Atom. > All required testing was done – bootstraps for Atom & Core2, make check. > Note that this fix affects only on Atom processors.
IMO, you should test LEA handling changes on x32 atom, too. With recent changes, you will get lots of zero_extended addresses through these functions, so I think it is worth to benchmark on x32 target. > ChangeLog: > 2012-08-08 Yuri Rumyantsev yuri.s.rumyant...@intel.com > > * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add additional > argument. > * config/i386/i386.md (ix86_splitt_lea_for_addr) : Add additional > argument curr_insn. > * config/i386/i386.c (find_nearest_reg-def): New function. Find > nearest register definition used in address. (find_nearest_reg_def) > (ix86_split_lea_for_addr) : Do more aggressive lea splitting and > instructions reodering to get opportunities for better scheduling. Please merge entries for ix86_split_lea_for_addr. @@ -16954,7 +16954,7 @@ ix86_lea_outperforms (rtx insn, unsigned int regno0, unsigned int regno1, /* If there is no use in memory addess then we just check that split cost exceeds AGU stall. */ if (dist_use < 0) - return dist_define >= LEA_MAX_STALL; + return dist_define > LEA_MAX_STALL; You didn't described this change in ChangeLog. Does this affect also affect benchmark speedup? +/* Return 0 if regno1 def is nearest to insn and 1 otherwise. */ Watch comment formatting and vertical spaces! +static int +find_nearest_reg_def (rtx insn, int regno1, int regno2) IMO, you can return 0, 1 and 2; with 0 when no definitions are found in the BB, 1 and 2 when either of two regnos are found. Otherwise, please use bool for function type. + if (insn_defines_reg (regno1, regno1, prev)) + return 0; + else if (insn_defines_reg (regno2, regno2, prev)) Please use INVALID_REGNUM as the second argument in the call to insn_defines_reg when looking for only one regno definition. { - emit_insn (gen_rtx_SET (VOIDmode, target, parts.base)); - tmp = parts.index; + rtx tmp1; + /* Try to give more opportunities to scheduler - + choose operand for move instruction with longer + distance from its definition to insn. */ (Hm, I don't think you mean gcc insn scheduler here.) + if (find_nearest_reg_def (insn, regno1, regno2) == 0) + { + tmp = parts.index; /* choose index for move. */ + tmp1 = parts.base; + } + else + { + tmp = parts.base; + tmp1 = parts.index; + } + emit_insn (gen_rtx_SET (VOIDmode, target, tmp)); + if (parts.disp && parts.disp != const0_rtx) + ix86_emit_binop (PLUS, mode, target, parts.disp); + ix86_emit_binop (PLUS, mode, target, tmp1); + return; } (Please use tabs instead of spaces in the added code.) However, this whole new part can be written simply as following (untested): { rtx tmp1; if (find_nearest_reg_def (insn, regno1, regno2) == 0) tmp1 = parts.base, tmp = parts.index; else tmp1 = parts.index, tmp = parts.base; emit_insn (gen_rtx_SET (VOIDmode, target, tmp1); } Please see how tmp is handled further down in the function. - ix86_emit_binop (PLUS, mode, target, tmp); + ix86_emit_binop (PLUS, mode, target, tmp); Please watch accidental whitespace changes. Uros.