On Mon, Nov 4, 2013 at 2:06 AM, James Greenhalgh <james.greenha...@arm.com> wrote: > On Fri, Nov 01, 2013 at 04:48:53PM +0000, Cong Hou wrote: >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi >> index 2a5a2e1..8f5d39a 100644 >> --- a/gcc/doc/md.texi >> +++ b/gcc/doc/md.texi >> @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3. >> Operand 3 is of a mode equal or >> wider than the mode of the product. The result is placed in operand 0, which >> is of the same mode as operand 3. >> >> +@cindex @code{ssad@var{m}} instruction pattern >> +@item @samp{ssad@var{m}} >> +@cindex @code{usad@var{m}} instruction pattern >> +@item @samp{usad@var{m}} >> +Compute the sum of absolute differences of two signed/unsigned elements. >> +Operand 1 and operand 2 are of the same mode. Their absolute difference, >> which >> +is of a wider mode, is computed and added to operand 3. Operand 3 is of a >> mode >> +equal or wider than the mode of the absolute difference. The result is >> placed >> +in operand 0, which is of the same mode as operand 3. >> + >> @cindex @code{ssum_widen@var{m3}} instruction pattern >> @item @samp{ssum_widen@var{m3}} >> @cindex @code{usum_widen@var{m3}} instruction pattern >> diff --git a/gcc/expr.c b/gcc/expr.c >> index 4975a64..1db8a49 100644 > > I'm not sure I follow, and if I do - I don't think it matches what > you have implemented for i386. > > From your text description I would guess the series of operations to be: > > v1 = widen (operands[1]) > v2 = widen (operands[2]) > v3 = abs (v1 - v2) > operands[0] = v3 + operands[3] > > But if I understand the behaviour of PSADBW correctly, what you have > actually implemented is: > > v1 = widen (operands[1]) > v2 = widen (operands[2]) > v3 = abs (v1 - v2) > v4 = reduce_plus (v3) > operands[0] = v4 + operands[3] > > To my mind, synthesizing the reduce_plus step will be wasteful for targets > who do not get this for free with their Absolute Difference step. Imagine a > simple loop where we have synthesized the reduce_plus, we compute partial > sums each loop iteration, though we would be better to leave the reduce_plus > step until after the loop. "REDUC_PLUS_EXPR" would be the appropriate > Tree code for this.
What do you mean when you use "synthesizing" here? For each pattern, the only synthesized operation is the one being returned from the pattern recognizer. In this case, it is USAD_EXPR. The recognition of reduce sum is necessary as we need corresponding prolog and epilog for reductions, which is already done before pattern recognition. Note that reduction is not a pattern but is a type of vector definition. A vectorization pattern can still be a reduction operation as long as STMT_VINFO_RELATED_STMT of this pattern is a reduction operation. You can check the other two reduction patterns: widen_sum_pattern and dot_prod_pattern for reference. Thank you for your comment! Cong > > I would prefer to see this Tree code not imply the reduce_plus. > > Thanks, > James >