Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

Stubbs, Andrew Fri, 01 Jul 2011 06:31:55 -0700

On 01/07/11 13:33, Paolo Bonzini wrote:
> Got it now! Casts from signed to unsigned are not value-preserving, but
> they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the
> same result bit-by-bit as the s64 result. The fact that s64 has an
> implicit 1111... in front, while an u64 has an implicit 0000... does not
> matter.


But, the 1111... and 0000... are not implicit. They are very real, and 
if applied incorrectly will change the result, I think.

> Is this the meaning of the predicate you want? I think so, based on the
> discussion, but it's hard to say without seeing the cases enumerated
> (i.e. a patch).

The purpose of this predicate is to determine whether any type 
conversions that occur between the output of a widening multiply, and 
the input of an addition have any bearing on the end result.

We know what the effective output type of the multiply is (the size is 
2x the input type, and the signed if either one of the inputs in 
signed), and we know what the input type of the addition is, but any 
amount of junk can lie in between. The problem is determining if it *is* 
junk.

In an ideal world there would only be two cases to consider:

   1. No conversion needed.

   2. A single sign-extend or zero-extend (according to the type of the 
inputs) to match the input size of the addition.

Anything else would be unsuitable for optimization. Of course, it's 
never that simple, but it should still be possible to boil down a list 
of conversions to one of these cases, if it's valid.

The signedness of the input to the addition is not significant - the 
code would be the same either way. But, I is important not to try to 
zero-extend something that started out signed, and not to sign-extend 
something that started out unsigned.

> However, perhaps there is a catch. We can do the following thought
> experiment. What would happen if you had multiple widening multiplies?
> Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit
> unsigned? I believe in this case you couldn't optimize 8-bit signed to
> 128-bit unsigned. Would your code do it?

My code does not attempt to combine multiple multiplies. In any case, if 
you have two multiplications, surely you have at least three input 
values, so they can't be combined?

It does attempt to combine a multiply and an addition, where a suitable 
madd* insn is available. (This is not new; I'm just trying to do it in 
more cases.)

I have considered the case where you have "(a * b) + (c * d)", but have 
not yet coded anything for it. At present, the code will simply choose 
whichever multiply happens to find itself the first input operand of the 
plus, and ignores the other, even if the first turns out not to be a 
suitable candidate.

Andrew

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

Reply via email to