On Tue, Mar 26, 2013 at 01:23:58AM +0400, Dinar Temirbulatov wrote: > Hi, > We noticed some performance gains if we are not using jump over some > simple switch statements. Here is the idea: Check whether the switch > statement can be expanded with conditional instructions. In that case > jump tables should be avoided since some branch instructions can be > eliminated in further passes (replaced by conditional execution). > > For example: > switch (i) > { > case 1: sum += 1; > case 2: sum += 3; > case 3: sum += 5; > case 4: sum += 10; > } > > Using jump tables the following code will be generated (ARM assembly): > > ldrcc pc, [pc, r0, lsl #2] > b .L5 > .L0: > .word L1 > .word L2 > .word L3 > .word L4 > > .L1: > add r3, #1 > .L2: > add r3, #4 > .L3: > add r3, #5 > .L4: > add r3, #10 > .L5 > > Although this code has a constant complexity it can be improved by the > conditional execution to avoid implicit branching: > > cmp r0,1 > addeq r3, #1 > cmp r0,2 > addeq r3, #4 > cmp r0,3 > addeq r3, #5 > cmp r0,4 > addeq r3, #10 > > Although the assembly below requires more assembly instructions to be > executed, it doesn't violate the CPU pipeline (since no branching is > performed). > How simple are other expansions? You can rewrite this as a={1,4,5,10} sum += a[i]
and in similar cases it is posible to set a[i]=0 to simulate nop. In this example is also second optimization possible, jump table is not neccessary and address can be computed directly. > The original version of patch for was developed by Alexey Kravets. I > measured some performance improvements/regressions using spec 2000 int > benchmark on Samsumg's exynos 5250. Here is the result: > > before: > Base Base Base Peak > Peak Peak > Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio > ------------ -------- -------- -------- -------- > -------- -------- > 164.gzip 1400 287 487* 1400 288 485* > 175.vpr 1400 376 373* 1400 374 374* > 176.gcc 1100 121 912* 1100 118 933* > 181.mcf 1800 242 743* 1800 251 718* > 186.crafty 1000 159 628* 1000 165 608* > 197.parser 1800 347 518* 1800 329 547* > 252.eon 1300 960 135* 1300 960 135* > 253.perlbmk 1800 214 842* 1800 212 848* > 254.gap 1100 138 797* 1100 136 806* > 255.vortex 1900 253 750* 1900 255 744* > 256.bzip2 1500 237 632* 1500 230 653* > 300.twolf X X > SPECint_base2000 561 > SPECint2000 563 > > After: > 164.gzip 1400 286 490 * 1400 288 486 * > 175.vpr 1400 213 656 * 1400 215 650 * > 176.gcc 1100 119 923 * 1100 118 933 * > 181.mcf 1800 247 730 * 1800 251 717 * > 186.crafty 1000 145 688 * 1000 150 664 * > 197.parser 1800 296 608 * 1800 275 654 * > 252.eon X X > 253.perlbmk 1800 206 872 * 1800 211 853 * > 254.gap 1100 133 825 * 1100 131 838 * > 255.vortex 1900 241 789 * 1900 239 797 * > 256.bzip2 1500 235 638 * 1500 226 663 * > 300.twolf X X > > The error in 252.eon was due to incorrect setup. Also "if (count > > 3*PARAM_VALUE (PARAM_SWITCH_JUMP_TABLES_BB_OPS_LIMIT))" does not look > correct, and probably it is better to move this code in the earlier > stage just before the gimple expand and keep preference expand state > (jump-tables or not) for every switch statement to avoid dealing with > the RTL altogether. > > thanks, Dinar.