> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote: > > Thank you for your detailed explanation. > > > > Once GCC detects a reduction operation, it will automatically > > accumulate all elements in the vector after the loop. In the loop the > > reduction variable is always a vector whose elements are reductions of > > corresponding values from other vectors. Therefore in your case the > > only instruction you need to generate is: > > > > VABAL ops[3], ops[1], ops[2] > > > > It is OK if you accumulate the elements into one in the vector inside > > of the loop (if one instruction can do this), but you have to make > > sure other elements in the vector should remain zero so that the final > > result is correct. > > > > If you are confused about the documentation, check the one for > > udot_prod (just above usad in md.texi), as it has very similar > > behavior as usad. Actually I copied the text from there and did some > > changes. As those two instruction patterns are both for vectorization, > > their behavior should not be difficult to explain. > > > > If you have more questions or think that the documentation is still > > improper please let me know.
Hi Cong, Thanks for your reply. I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and DOT_PROD_EXPR and I see that the same ambiguity exists for DOT_PROD_EXPR. Can you please add a note in your tree.def that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either: tmp = WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = PLUS_EXPR (tmp2, arg3) or: tmp = WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = WIDEN_SUM_EXPR (tmp2, arg3) Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a a value of the same (widened) type as arg3. Also, while looking for the history of DOT_PROD_EXPR I spotted this patch: [autovect] [patch] detect mult-hi and sad patterns http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html I wonder what the reason was for that patch to be dropped? Thanks, James