Ping?
thanks, Cong On Mon, Dec 2, 2013 at 5:06 PM, Cong Hou <co...@google.com> wrote: > Hi Richard > > Could you please take a look at this patch and see if it is ready for > the trunk? The patch is pasted as a text file here again. > > Thank you very much! > > > Cong > > > On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou <co...@google.com> wrote: >> Hi James >> >> Sorry for the late reply. >> >> >> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh >> <james.greenha...@arm.com> wrote: >>>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote: >>>> > Thank you for your detailed explanation. >>>> > >>>> > Once GCC detects a reduction operation, it will automatically >>>> > accumulate all elements in the vector after the loop. In the loop the >>>> > reduction variable is always a vector whose elements are reductions of >>>> > corresponding values from other vectors. Therefore in your case the >>>> > only instruction you need to generate is: >>>> > >>>> > VABAL ops[3], ops[1], ops[2] >>>> > >>>> > It is OK if you accumulate the elements into one in the vector inside >>>> > of the loop (if one instruction can do this), but you have to make >>>> > sure other elements in the vector should remain zero so that the final >>>> > result is correct. >>>> > >>>> > If you are confused about the documentation, check the one for >>>> > udot_prod (just above usad in md.texi), as it has very similar >>>> > behavior as usad. Actually I copied the text from there and did some >>>> > changes. As those two instruction patterns are both for vectorization, >>>> > their behavior should not be difficult to explain. >>>> > >>>> > If you have more questions or think that the documentation is still >>>> > improper please let me know. >>> >>> Hi Cong, >>> >>> Thanks for your reply. >>> >>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and >>> DOT_PROD_EXPR and I see that the same ambiguity exists for >>> DOT_PROD_EXPR. Can you please add a note in your tree.def >>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either: >>> >>> tmp = WIDEN_MINUS_EXPR (arg1, arg2) >>> tmp2 = ABS_EXPR (tmp) >>> arg3 = PLUS_EXPR (tmp2, arg3) >>> >>> or: >>> >>> tmp = WIDEN_MINUS_EXPR (arg1, arg2) >>> tmp2 = ABS_EXPR (tmp) >>> arg3 = WIDEN_SUM_EXPR (tmp2, arg3) >>> >>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a >>> a value of the same (widened) type as arg3. >>> >> >> >> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I >> mentioned it in tree.def). >> >> >>> Also, while looking for the history of DOT_PROD_EXPR I spotted this >>> patch: >>> >>> [autovect] [patch] detect mult-hi and sad patterns >>> http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html >>> >>> I wonder what the reason was for that patch to be dropped? >>> >> >> It has been 8 years.. I have no idea why this patch is not accepted >> finally. There is even no reply in that thread. But I believe the SAD >> pattern is very important to be recognized. ARM also provides >> instructions for it. >> >> >> Thank you for your comment again! >> >> >> thanks, >> Cong >> >> >> >>> Thanks, >>> James >>>