On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt <v...@linux.vnet.ibm.com> wrote: > I've been trying to fix some bad tree-ssa related optimisation for > s390x and come up with the attached experimental patch. The patch > is not really good - it breaks some situations in which the > optimisation was useful. With this code: > > void bar(long); > void foo (char a) > { > long l; > char b; > > b = a & 63; > l = b; > if (l > 9) > bar(l); > } > > We get this representation before value range propagation: > > ... > a_4 = *p_3(D); > b_5 = a_4 & 63; > l_6 = (long int) b_5; > if (l_6 > 9) > ... > > Now, there's some code in tree-vrp.c:simplify_cond_using_ranges() > that folds b_5 into the if condition, because l_6 is just a sign > extension of b_5, and the value range of l_6 can also be > represented by the type of b (char). > > if (b_5 > 9) > > (On s390x we end up with "a & 63" stored in two separate > registers, extended to 32 bits in one and to 64 bits in the other, > adding up to two unnecessary instructions.) > > A naive idea to prevent folding in this situation was to suppress > it if it would introduce a second use of b_5 (i.e. b_5 was only > used in the cast before) while not eliminating all uses of l_6. > However, calling has_single_use() for both purposes proves to be > not good enough, and VRP does not do this kind of optimisation > yet. It does not catch cases like > > if (l_6 > 9) > ... > else if (l_6 > 7) > ... > > where all occurences of l_6 could be replaced, and simply looking > at the use counts is too coarse. > > -- > > Is VRP the right pass to do this optimisation or should a later > pass rather attempt to eliminate the new use of b_5 instead? Uli > has brought up the idea a mini "sign extend elimination" pass that > checks if the result of a sign extend could be replaced by the > original quantity in all places, and if so, eliminate the ssa > name. (I guess that won't help with the above code because l is > used also as a function argument.) How could a sensible approach > to deal with the situation look like?
We run into this kind of situation regularly and for general foldings in match.pd we settled with single_use () even though it is not perfect. Note the usual complaint is not extra extension instructions but the increase of register pressure. This is because it is hard to do better when you are doing local optimization. As for the question on whether VRP is the right pass to do this the answer is two-fold -- VRP has the most precise range information. But the folding itself should be moved to generic code and use get_range_info (). Richard. > Ciao > > Dominik ^_^ ^_^ > > -- > > Dominik Vogt > IBM Germany >