On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
> On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>
>>> I assume that this is not right way for fixing such simple performance
>>> anomaly since we need to do redundant work - combine load to
>>> conditional and then split it back in peephole2? Does it look
>>> reasonable? Why we should produce non-efficient instrucction that must
>>> be splitted later?
>>
>> Well, either don't allow this instruction variant from the start, or allow
>> the extra freedom for register allocation this creates.  It doesn't make
>> sense to just reject it being generated by combine - that doesn't address
>> when it materializes in another way.
>
> Please check the attached patch, it implements this limitation in a correct 
> way:
> - keeps memory operands for -Os or cold parts of the executable
> - doesn't increase register pressure
> - handles all situations where memory operand can propagate into RTX
>
> Yuri, can you please check if this patch fixes the performance problem for 
> you?
>
> BTW: I would really like to add some TARGET_USE_CMOVE_WITH_MEMOP
> target macro and conditionalize new peephole2 patterns on it.

Looks good to me.  I believe optimize_insn_for_speed_p ()
only works reliable during RTL expansion as it relies on
crtl->maybe_hot_insn_p to be better than function granular.  I'm quite sure
split does not adjust this (it probably should, as those predicates are
definitely the correct ones to use), via rtl_profile_for_bb ().  I
think passes that
do not adjust this get what is left over by previous passes (instead of the
default).

Honza, I think the pass manager should call default_rtl_profile () before each
RTL pass to avoid this, no?

Thanks,
Richard.

> Uros.

Reply via email to