On Tue, Mar 26, 2013 at 01:23:58AM +0400, Dinar Temirbulatov wrote:
> Hi,
> We noticed some performance gains if we are not using jump over some
> simple switch statements. Here is the idea: Check whether the switch
> statement can be expanded with conditional instructions. In that case
> jump tables should be avoided since some branch instructions can be
> eliminated in further passes (replaced by conditional execution).
> 
>    For example:
>    switch (i)
>    {
>      case 1: sum += 1;
>      case 2: sum += 3;
>      case 3: sum += 5;
>      case 4: sum += 10;
>    }
> 
> Using jump tables the following code will be generated (ARM assembly):
> 
>    ldrcc pc, [pc, r0, lsl #2]
>    b .L5
>    .L0:
>         .word L1
>         .word L2
>         .word L3
>         .word L4
> 
>    .L1:
>         add r3, #1
>    .L2:
>         add r3, #4
>    .L3:
>         add r3, #5
>    .L4:
>         add r3, #10
>    .L5
> 
> Although this code has a constant complexity it can be improved by the
> conditional execution to avoid implicit branching:
> 
>    cmp r0,1
>    addeq r3, #1
>    cmp r0,2
>    addeq r3, #4
>    cmp r0,3
>    addeq r3, #5
>    cmp r0,4
>    addeq r3, #10
> 
> Although the assembly below requires more assembly instructions to be
> executed, it doesn't violate the CPU pipeline (since no branching is
> performed).
>
How simple are other expansions? You can rewrite this as
a={1,4,5,10} 
sum += a[i]

and in similar cases it is posible to set a[i]=0 to simulate nop.

In this example is also second optimization possible, jump table is not
neccessary and address can be computed directly.

> The original version of patch for was developed by Alexey Kravets. I
> measured some performance improvements/regressions using spec 2000 int
> benchmark on Samsumg's exynos 5250. Here is the result:
> 
> before:
>                            Base      Base      Base      Peak
> Peak      Peak
>    Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
>    ------------           --------      --------  --------  --------
> --------  --------
>    164.gzip          1400        287       487*     1400       288       485*
>    175.vpr           1400        376       373*     1400       374       374*
>    176.gcc           1100        121       912*     1100       118       933*
>    181.mcf           1800        242       743*     1800       251       718*
>    186.crafty        1000        159       628*     1000       165       608*
>    197.parser        1800       347       518*     1800       329       547*
>    252.eon           1300       960       135*     1300       960       135*
>    253.perlbmk       1800      214       842*     1800       212       848*
>    254.gap           1100       138       797*     1100       136       806*
>    255.vortex        1900       253       750*     1900       255       744*
>    256.bzip2         1500       237       632*     1500       230       653*
>    300.twolf                                 X                             X
>    SPECint_base2000                       561
>    SPECint2000                                                          563
> 
> After:
>    164.gzip          1400   286       490    *     1400   288       486    *
>    175.vpr           1400   213       656    *     1400   215       650    *
>    176.gcc           1100   119       923    *     1100   118       933    *
>    181.mcf          1800   247       730    *     1800   251       717    *
>    186.crafty        1000   145       688    *     1000   150       664    *
>    197.parser       1800   296       608    *     1800   275       654    *
>    252.eon                                   X                             X
>    253.perlbmk     1800   206       872    *     1800   211       853    *
>    254.gap           1100   133       825    *     1100   131       838    *
>    255.vortex        1900   241       789    *     1900   239       797    *
>    256.bzip2         1500   235       638    *     1500   226       663    *
>    300.twolf                                 X                             X
> 
> The error in 252.eon was due to incorrect setup. Also "if (count >
> 3*PARAM_VALUE (PARAM_SWITCH_JUMP_TABLES_BB_OPS_LIMIT))" does not look
> correct, and probably it is better to move this code in the earlier
> stage just before the gimple expand and keep preference expand state
> (jump-tables or not) for every switch statement to avoid dealing with
> the RTL altogether.
> 
>                      thanks, Dinar.




Reply via email to