Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-19 Thread Uros Bizjak
On Fri, Dec 14, 2012 at 11:47 AM, Yuri Rumyantsev wrote: > With your new fix that add if-then-else splitting for memory operand I > got expected performance speed-up - +6.7% for Atom and +8.4% for SNB. > We need to do all testing this weekend and I will get you our final > feedback on Monday. Af

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-14 Thread Yuri Rumyantsev
Hi Uros, With your new fix that add if-then-else splitting for memory operand I got expected performance speed-up - +6.7% for Atom and +8.4% for SNB. We need to do all testing this weekend and I will get you our final feedback on Monday. Thanks ahead for all your help. Yuri. 2012/12/13 Uros Bizj

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Jan Hubicka
> > Honza, I think the pass manager should call default_rtl_profile () before > > each > > RTL pass to avoid this, no? > > Please note that we have plenty of existing peephole2s that use > optimize_insn_for_speed_p predicate. It is assumed to work ... It is set by peep2 pass static void peephole

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Jan Hubicka
> On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak wrote: > > On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener > > wrote: > > > >>> I assume that this is not right way for fixing such simple performance > >>> anomaly since we need to do redundant work - combine load to > >>> conditional and then split

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Uros Bizjak
On Thu, Dec 13, 2012 at 4:02 PM, Yuri Rumyantsev wrote: > We did not see any performance improvement on Atom in 32-bit mode at > routelookup from eembc_2_0 (eembc_1_1). I assume that for x86_64 the patch works as expected. Let's take a bigger hammer for 32bit targets - the splitter that effectiv

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Yuri Rumyantsev
Uros, We did not see any performance improvement on Atom in 32-bit mode at routelookup from eembc_2_0 (eembc_1_1). Best regards. Yuri. 2012/12/13 Uros Bizjak : > On Thu, Dec 13, 2012 at 3:27 PM, Uros Bizjak wrote: > >>> The patch proposed by Uros is useless since we don't have free scratch >>>

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Uros Bizjak
On Thu, Dec 13, 2012 at 3:27 PM, Uros Bizjak wrote: >> The patch proposed by Uros is useless since we don't have free scratch >> register to do splitting of memory operand: >> >> ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp] >> 17[flags] >> >> ... >> >> (insn 96 131 13

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Richard Biener
On Thu, Dec 13, 2012 at 3:23 PM, Yuri Rumyantsev wrote: > Hi Guys, > > The patch proposed by Uros is useless since we don't have free scratch > register to do splitting of memory operand: > > ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp] > 17[flags] > > ... > > (insn 96

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Uros Bizjak
On Thu, Dec 13, 2012 at 3:23 PM, Yuri Rumyantsev wrote: > The patch proposed by Uros is useless since we don't have free scratch > register to do splitting of memory operand: > > ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp] > 17[flags] > > ... > > (insn 96 131 132 7 (

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Yuri Rumyantsev
Hi Guys, The patch proposed by Uros is useless since we don't have free scratch register to do splitting of memory operand: ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp] 17[flags] ... (insn 96 131 132 7 (set (reg/v/f:SI 6 bp [orig:70 trie_root ] [70]) (if_the

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Richard Biener
On Thu, Dec 13, 2012 at 11:20 AM, Uros Bizjak wrote: > On Thu, Dec 13, 2012 at 10:51 AM, Richard Biener > wrote: > > I assume that this is not right way for fixing such simple performance > anomaly since we need to do redundant work - combine load to > conditional and then split it ba

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Uros Bizjak
On Thu, Dec 13, 2012 at 10:51 AM, Richard Biener wrote: I assume that this is not right way for fixing such simple performance anomaly since we need to do redundant work - combine load to conditional and then split it back in peephole2? Does it look reasonable? Why we should p

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Richard Biener
On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak wrote: > On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener > wrote: > >>> I assume that this is not right way for fixing such simple performance >>> anomaly since we need to do redundant work - combine load to >>> conditional and then split it back in pe

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Henderson
On 12/12/2012 10:32 AM, Uros Bizjak wrote: > Please check the attached patch, it implements this limitation in a correct > way: > - keeps memory operands for -Os or cold parts of the executable > - doesn't increase register pressure > - handles all situations where memory operand can propagate int

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener wrote: >> I assume that this is not right way for fixing such simple performance >> anomaly since we need to do redundant work - combine load to >> conditional and then split it back in peephole2? Does it look >> reasonable? Why we should produce no

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 3:39 PM, Yuri Rumyantsev wrote: > Guys, > > I assume that this is not right way for fixing such simple performance > anomaly since we need to do redundant work - combine load to > conditional and then split it back in peephole2? Does it look > reasonable? Why we should prod

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Guys, I assume that this is not right way for fixing such simple performance anomaly since we need to do redundant work - combine load to conditional and then split it back in peephole2? Does it look reasonable? Why we should produce non-efficient instrucction that must be splitted later? Best re

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 1:55 PM, Uros Bizjak wrote: > On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener > wrote: > >>> This fix is aimed to remove performance degradation introduced by new >>> LRA phase that in fact is combining problem. Gcc combiner does >>> propagation of memory load to if-then-

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener wrote: >> This fix is aimed to remove performance degradation introduced by new >> LRA phase that in fact is combining problem. Gcc combiner does >> propagation of memory load to if-then-else gimple that was splitted >> back by old reload phase. LR

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Hi Richard, I assume that this fix does not affect on code size since such pattern happens very rare although I can add a check on it if you insist. Register pressure is not a issue here since I assume that additional fill won't affect on performance as cmove with memory operand. I decided to not

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 12:47 PM, Yuri Rumyantsev wrote: > Hi Uros, > > This fix is for all x86 platforms, we tested it on core2/corei7, > atom/atom2 and AMD and got performance improvement +6% -- +11%. So I > don' think we need to introduce additioanl tune feature. > > Sorry for my typo with gcc

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Hi Uros, This fix is for all x86 platforms, we tested it on core2/corei7, atom/atom2 and AMD and got performance improvement +6% -- +11%. So I don' think we need to introduce additioanl tune feature. Sorry for my typo with gcc version - I ment mainline only since 4.7 does not use LRA. Thanks. Yu

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev wrote: > Hi All, > > This fix is aimed to remove performance degradation introduced by new > LRA phase that in fact is combining problem. Gcc combiner does > propagation of memory load to if-then-else gimple that was splitted > back by old reload p

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev wrote: > This fix is aimed to remove performance degradation introduced by new > LRA phase that in fact is combining problem. Gcc combiner does > propagation of memory load to if-then-else gimple that was splitted > back by old reload phase. LRA d

[PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Hi All, This fix is aimed to remove performance degradation introduced by new LRA phase that in fact is combining problem. Gcc combiner does propagation of memory load to if-then-else gimple that was splitted back by old reload phase. LRA does not perform such splitting. To avoid performance slowd