On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener > <richard.guent...@gmail.com> wrote: > >>> I assume that this is not right way for fixing such simple performance >>> anomaly since we need to do redundant work - combine load to >>> conditional and then split it back in peephole2? Does it look >>> reasonable? Why we should produce non-efficient instrucction that must >>> be splitted later? >> >> Well, either don't allow this instruction variant from the start, or allow >> the extra freedom for register allocation this creates. It doesn't make >> sense to just reject it being generated by combine - that doesn't address >> when it materializes in another way. > > Please check the attached patch, it implements this limitation in a correct > way: > - keeps memory operands for -Os or cold parts of the executable > - doesn't increase register pressure > - handles all situations where memory operand can propagate into RTX > > Yuri, can you please check if this patch fixes the performance problem for > you? > > BTW: I would really like to add some TARGET_USE_CMOVE_WITH_MEMOP > target macro and conditionalize new peephole2 patterns on it.
Looks good to me. I believe optimize_insn_for_speed_p () only works reliable during RTL expansion as it relies on crtl->maybe_hot_insn_p to be better than function granular. I'm quite sure split does not adjust this (it probably should, as those predicates are definitely the correct ones to use), via rtl_profile_for_bb (). I think passes that do not adjust this get what is left over by previous passes (instead of the default). Honza, I think the pass manager should call default_rtl_profile () before each RTL pass to avoid this, no? Thanks, Richard. > Uros.