> On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > > On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener > > <richard.guent...@gmail.com> wrote: > > > >>> I assume that this is not right way for fixing such simple performance > >>> anomaly since we need to do redundant work - combine load to > >>> conditional and then split it back in peephole2? Does it look > >>> reasonable? Why we should produce non-efficient instrucction that must > >>> be splitted later? > >> > >> Well, either don't allow this instruction variant from the start, or allow > >> the extra freedom for register allocation this creates. It doesn't make > >> sense to just reject it being generated by combine - that doesn't address > >> when it materializes in another way. > > > > Please check the attached patch, it implements this limitation in a correct > > way: > > - keeps memory operands for -Os or cold parts of the executable > > - doesn't increase register pressure > > - handles all situations where memory operand can propagate into RTX > > > > Yuri, can you please check if this patch fixes the performance problem for > > you? > > > > BTW: I would really like to add some TARGET_USE_CMOVE_WITH_MEMOP > > target macro and conditionalize new peephole2 patterns on it. > > Looks good to me. I believe optimize_insn_for_speed_p () > only works reliable during RTL expansion as it relies on > crtl->maybe_hot_insn_p to be better than function granular. I'm quite sure > split does not adjust this (it probably should, as those predicates are > definitely the correct ones to use), via rtl_profile_for_bb (). I > think passes that > do not adjust this get what is left over by previous passes (instead of the > default). > > Honza, I think the pass manager should call default_rtl_profile () before each > RTL pass to avoid this, no?
Yep, we should get to default_rtl_profile after each pass. We should also fix passes that do splitting/expansion without preprly setting current basic block. Honza > > Thanks, > Richard. > > > Uros.