On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj <senthil_kumar.selva...@atmel.com> wrote: >Hi, > > When analyzing code size differences between a 4.9.x compiler and > trunk for the AVR target, I found quite a few cases where extra > registers were being used to hold smaller types (two 8 bit registers > for a uint8_t, for example). > >On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was the >point > at which the dumps start to diverge. > > For code like this > >#include <stdint.h> > >uint16_t get(const uint16_t wvalue) >{ > const uint8_t type = (wvalue >> 8); > uint16_t size = 0; > > if (type == 1) > { > size = 20; > } > else if (type == 2) > { > size = 32; > } > return size; >} > > VRP substitutes wvalue for the type variable in the conditionals > (simplify_cond_using_ranges:9506), as it figures out that the range > of wvalue is the same as a uint8_t). That is, code goes from > ><bb 2>: >_3 = wvalue_2(D) >> 8; >type_4 = (const uint8_t) _3; >if (type_4 == 1) > goto <bb 5>; >else > goto <bb 3>; > >to > ><bb 2>: >_3 = wvalue_2(D) >> 8; >type_4 = (const uint8_t) _3; >if (_3 == 1) > goto <bb 5>; >else > goto <bb 3>; > > This "widening" causes later passes to allocate extra registers > (holding zeros for the extra bits) and generate extra comparisons > for the AVR target. > > I found that in the 4.9.2 compiler I was benchmarking against, VRP > didn't figure out the range for wvalue and therefore the folding > didn't happen, which was why the code was better. > > With the native compiler on my machine (gcc 5.2 x86_64) - replacing > uint8_t by uint32_t and uint16_t by uint64_t, and shifting right by > 32 bits instead of 8 shows the same effect - the generated code uses > rdi instead of just di to hold the type variable. > > Is this a bug? Should the folding happen only if the type > conversion was from a smaller type to a bigger one? Or is the backend > supposed to detect this pattern and deal with it?
We should probably avoid widening beyond word_mode (or sth else if there is sth suitable). Richard. >Regards >Senthil > > >details-raw vrp1 dump > >;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0, >symbol_order=0) > >;; 1 loops found >;; >;; Loop 0 >;; header 0, latch 1 >;; depth 0, outer -1 >;; nodes: 0 1 2 3 4 5 >;; 2 succs { 5 3 } >;; 3 succs { 4 5 } >;; 4 succs { 5 } >;; 5 succs { 1 } > >ASSERT_EXPRs to be inserted > >Assertions to be inserted for type_4 > if (type_4 == 1) > > BB #3 > EDGE 2->3 2 [55.9%] (FALSE_VALUE,EXECUTABLE) > PREDICATE: type_4 ne_expr 1 > > > > >Updating SSA: >Registering new PHI nodes in block #2 >Updating SSA information for statement type_4 = (const uint8_t) _3; >Updating SSA information for statement if (type_4 == 1) >Registering new PHI nodes in block #3 >Updating SSA information for statement type_6 = ASSERT_EXPR <type_4, >type_4 != 1>; >Updating SSA information for statement if (type_4 == 2) >Registering new PHI nodes in block #4 >Registering new PHI nodes in block #5 > >SSA replacement table >N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j > >type_6 -> { type_4 } >Incremental SSA update started at block: 2 >Number of blocks in CFG: 6 >Number of blocks to update: 2 ( 33%) >Affected blocks: 2 3 > > > >SSA form after inserting ASSERT_EXPRs >get (const uint16_t wvalue, const uint8_t windex, const void * * const >address) >{ > uint16_t size; > const uint8_t type; > unsigned int _3; > > <bb 2>: > _3 = wvalue_2(D) >> 8; > type_4 = (const uint8_t) _3; > if (type_4 == 1) > goto <bb 5>; > else > goto <bb 3>; > > <bb 3>: > type_6 = ASSERT_EXPR <type_4, type_4 != 1>; > if (type_6 == 2) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > > <bb 5>: > # size_1 = PHI <20(2), 0(3), 32(4)> > return size_1; > >} > > >Immediate_uses: > >size_1 : --> single use. >return size_1; > >wvalue_2(D) : --> single use. >_3 = wvalue_2(D) >> 8; > >_3 : --> single use. >type_4 = (const uint8_t) _3; > >type_4 : -->3 uses. >type_6 = ASSERT_EXPR <type_4, type_4 != 1>; >type_6 = ASSERT_EXPR <type_4, type_4 != 1>; >if (type_4 == 1) > >.MEM_5(D) : --> single use. ># VUSE <.MEM_5(D)> >return size_1; > >type_6 : --> single use. >if (type_6 == 2) > >Adding destination of edge (0 -> 2) to worklist > >Simulating block 2 > >Visiting statement: >_3 = wvalue_2(D) >> 8; >Intersecting > [0, 255] >and > [0, +INF] >to > [0, 255] >Found new range for _3: [0, 255] >interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t) _3; >marking stmt to be not simulated again > >Visiting statement: >type_4 = (const uint8_t) _3; >Found new range for type_4: [0, +INF] >interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR <type_4, >type_4 != 1>; >interesting_ssa_edges: adding SSA use in if (type_4 == 1) >marking stmt to be not simulated again > >Visiting statement: >if (type_4 == 1) > >Visiting conditional with predicate: if (type_4 == 1) > >With known ranges > type_4: [0, +INF] > >Predicate evaluates to: DON'T KNOW >Adding destination of edge (2 -> 5) to worklist >Adding destination of edge (2 -> 3) to worklist > >Simulating block 3 > >Visiting statement: >type_6 = ASSERT_EXPR <type_4, type_4 != 1>; >Intersecting > ~[1, 1] EQUIVALENCES: { type_4 } (1 elements) >and > [0, +INF] >to > ~[1, 1] EQUIVALENCES: { type_4 } (1 elements) >Found new range for type_6: ~[1, 1] >interesting_ssa_edges: adding SSA use in if (type_6 == 2) >marking stmt to be not simulated again > >Visiting statement: >if (type_6 == 2) > >Visiting conditional with predicate: if (type_6 == 2) > >With known ranges > type_6: ~[1, 1] EQUIVALENCES: { type_4 } (1 elements) > >Predicate evaluates to: DON'T KNOW >Adding destination of edge (3 -> 4) to worklist > >Simulating block 4 > >Simulating block 5 > >Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)> > Argument #0 (2 -> 5 executable) > 20: [20, 20] > Argument #1 (3 -> 5 executable) > 0: [0, 0] >Meeting > [20, 20] >and > [0, 0] >to > [0, 20] > Argument #2 (4 -> 5 executable) > 32: [32, 32] >Meeting > [0, 20] >and > [32, 32] >to > [0, 32] >Intersecting > [0, 32] >and > [0, +INF] >to > [0, 32] >Found new range for size_1: [0, 32] >interesting_ssa_edges: adding SSA use in return size_1; >marking stmt to be not simulated again > >Visiting statement: >return size_1; > >Value ranges after VRP: > >size_1: [0, 32] >wvalue_2(D): VARYING >_3: [0, 255] >type_4: [0, +INF] >type_6: ~[1, 1] EQUIVALENCES: { type_4 } (1 elements) > > > >Substituting values and folding statements > >Folding statement: _3 = wvalue_2(D) >> 8; >Not folded >Folding statement: type_4 = (const uint8_t) _3; >Not folded >Folding statement: if (type_4 == 1) >Folded into: if (_3 == 1) > >Folding statement: if (type_6 == 2) >Not folded >Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)> >No folding possible >Folding statement: return size_1; >Not folded >get (const uint16_t wvalue, const uint8_t windex, const void * * const >address) >{ > uint16_t size; > const uint8_t type; > unsigned int _3; > > <bb 2>: > _3 = wvalue_2(D) >> 8; > type_4 = (const uint8_t) _3; > if (_3 == 1) > goto <bb 5>; > else > goto <bb 3>; > > <bb 3>: > if (type_4 == 2) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > > <bb 5>: > # size_1 = PHI <20(2), 0(3), 32(4)> > return size_1; > >}