On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs <andrew.stu...@gmail.com> wrote: > On 07/07/11 11:26, Andrew Stubbs wrote: >> >> On 07/07/11 10:58, Richard Guenther wrote: >>> >>> I think you should assume that series of widenings, >>> (int)(short)char_variable >>> are already combined. Thus I believe you only need to consider a single >>> conversion in valid_types_for_madd_p. >> >> Hmm, I'm not so sure. I'll look into it a bit further. > > OK, here's a test case that gives multiple conversions: > > long long > foo (long long a, signed char b, signed char c) > { > int bc = b * c; > return a + (short)bc; > } > > The dump right before the widen_mult pass gives: > > foo (long long int a, signed char b, signed char c) > { > int bc; > long long int D.2018; > short int D.2017; > long long int D.2016; > int D.2015; > int D.2014; > > <bb 2>: > D.2014_2 = (int) b_1(D); > D.2015_4 = (int) c_3(D); > bc_5 = D.2014_2 * D.2015_4; > D.2017_6 = (short int) bc_5;
Ok, so you have a truncation that is a no-op value-wise. I would argue that this truncation should be removed independent on whether we have a widening multiply instruction or not. The technically most capable place to remove non-value-changing truncations (and combine them with a successive conversion) would be value-range propagation. Which already knows: Value ranges after VRP: b_1(D): VARYING D.2698_2: [-128, 127] c_3(D): VARYING D.2699_4: [-128, 127] bc_5: [-16256, 16384] D.2701_6: [-16256, 16384] D.2702_7: [-16256, 16384] a_8(D): VARYING D.2700_9: VARYING thus truncating bc_5 to short does not change the value. The simplification could be made when looking at the statement > D.2018_7 = (long long int) D.2017_6; in vrp_fold_stmt, based on the fact that this conversion converts from a value-preserving intermediate conversion. Thus the transform would replace the D.2017_6 operand with bc_5. So yes, the case appears - but it shouldn't ;) I'll cook up a quick patch for VRP. Thanks, Richard. > D.2016_9 = D.2018_7 + a_8(D); > return D.2016_9; > > } > > Here we have a multiply and accumulate done the long way. The 8-bit inputs > are widened to 32-bit, multiplied to give a 32-bit result (of which only the > lower 16-bits contain meaningful data), then truncated to 16-bits, and > sign-extended up to 64-bits ready for the 64-bit addition. > > This is slight contrived, perhaps, but not unlike the sort of thing that > might occur when you have inline functions and macros, and most importantly > - it is mathematically valid! > > > So, here's the output from my patched widen_mult pass: > > foo (long long int a, signed char b, signed char c) > { > int bc; > long long int D.2018; > short int D.2017; > long long int D.2016; > int D.2015; > int D.2014; > > <bb 2>: > D.2014_2 = (int) b_1(D); > D.2015_4 = (int) c_3(D); > bc_5 = b_1(D) w* c_3(D); > D.2017_6 = (short int) bc_5; > D.2018_7 = (long long int) D.2017_6; > D.2016_9 = WIDEN_MULT_PLUS_EXPR <b_1(D), c_3(D), a_8(D)>; > return D.2016_9; > > } > > As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is now > redundant. (Ideally, this would be removed now, but in fact it doesn't get > eliminated until the RTL into_cfglayout pass. This is not new behaviour.) > > > My point is that it's possible to have at least two conversions to examine. > Is it possible to have more? I don't know, but once I'm dealing with two I > might as well deal with an arbitrary number. > > Andrew >