On Tue, Aug 2, 2011 at 4:37 AM, Jiangning Liu <jiangning....@arm.com> wrote: > Hi, > > For the following simple test case, PRE optimization hoists computation > (s!=1) into the default branch of the switch statement, and finally causes > very poor code generation. This problem occurs in both X86 and ARM, and I > believe it is also a problem for other targets. > > int f(char *t) { > int s=0; > > while (*t && s != 1) { > switch (s) { > case 0: > s = 2; > break; > case 2: > s = 1; > break; > default: > if (*t == '-') > s = 1; > break; > } > t++; > } > > return s; > } > > Taking X86 as an example, with option "-O2" you may find 52 instructions > generated like below, > > 00000000 <f>: > 0: 55 push %ebp > 1: 31 c0 xor %eax,%eax > 3: 89 e5 mov %esp,%ebp > 5: 57 push %edi > 6: 56 push %esi > 7: 53 push %ebx > 8: 8b 55 08 mov 0x8(%ebp),%edx > b: 0f b6 0a movzbl (%edx),%ecx > e: 84 c9 test %cl,%cl > 10: 74 50 je 62 <f+0x62> > 12: 83 c2 01 add $0x1,%edx > 15: 85 c0 test %eax,%eax > 17: 75 23 jne 3c <f+0x3c> > 19: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi > 20: 0f b6 0a movzbl (%edx),%ecx > 23: 84 c9 test %cl,%cl > 25: 0f 95 c0 setne %al > 28: 89 c7 mov %eax,%edi > 2a: b8 02 00 00 00 mov $0x2,%eax > 2f: 89 fb mov %edi,%ebx > 31: 83 c2 01 add $0x1,%edx > 34: 84 db test %bl,%bl > 36: 74 2a je 62 <f+0x62> > 38: 85 c0 test %eax,%eax > 3a: 74 e4 je 20 <f+0x20> > 3c: 83 f8 02 cmp $0x2,%eax > 3f: 74 1f je 60 <f+0x60> > 41: 80 f9 2d cmp $0x2d,%cl > 44: 74 22 je 68 <f+0x68> > 46: 0f b6 0a movzbl (%edx),%ecx > 49: 83 f8 01 cmp $0x1,%eax > 4c: 0f 95 c3 setne %bl > 4f: 89 df mov %ebx,%edi > 51: 84 c9 test %cl,%cl > 53: 0f 95 c3 setne %bl > 56: 89 de mov %ebx,%esi > 58: 21 f7 and %esi,%edi > 5a: eb d3 jmp 2f <f+0x2f> > 5c: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi > 60: b0 01 mov $0x1,%al > 62: 5b pop %ebx > 63: 5e pop %esi > 64: 5f pop %edi > 65: 5d pop %ebp > 66: c3 ret > 67: 90 nop > 68: b8 01 00 00 00 mov $0x1,%eax > 6d: 5b pop %ebx > 6e: 5e pop %esi > 6f: 5f pop %edi > 70: 5d pop %ebp > 71: c3 ret > > But with command line option "-O2 -fno-tree-pre", there are only 12 > instructions generated, and the code would be very clean like below, > > 00000000 <f>: > 0: 55 push %ebp > 1: 31 c0 xor %eax,%eax > 3: 89 e5 mov %esp,%ebp > 5: 8b 55 08 mov 0x8(%ebp),%edx > 8: 80 3a 00 cmpb $0x0,(%edx) > b: 74 0e je 1b <f+0x1b> > d: 80 7a 01 00 cmpb $0x0,0x1(%edx) > 11: b0 02 mov $0x2,%al > 13: ba 01 00 00 00 mov $0x1,%edx > 18: 0f 45 c2 cmovne %edx,%eax > 1b: 5d pop %ebp > 1c: c3 ret > > Do you have any idea about this?
If you look at what PRE does it is clear that it's a profitable transformation as it reduces the number of instructions computed on the paths where s is set to a constant. If you ask for size optimization you won't get this transformation. Now - on the tree level we don't see the if-conversion opportunity (we don't have a conditional move instruction), but still we could improve phiopt to handle switch statements - not sure what we could do with the case in question though which is <bb 3>: switch (s_3) <default: <L3>, case 0: <L11>, case 2: <L10>> <L11>: goto <bb 7> (<L10>); <L3>: if (D.2703_6 == 45) goto <bb 6>; else goto <bb 7> (<L10>); <bb 6>: # s_2 = PHI <2(4), 1(3), 1(6), s_3(5)> <L10>: as the condition in at L3 complicates things. The code is also really really special (and written in an awful way ...). Richard. > Thanks, > -Jiangning > > > >