On Mon, Oct 6, 2014 at 7:30 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: > Ok, so unless there are objections, I plan to commit patches 1, 2, 4, 5, and > 6, > which have been previously approved, in that sequence. (Of those, all bar > patch > 2 are AArch64 only.) I think this is better than maintaining an > ever-expanding > patch series.
Agreed. > Then I'll get to work on migrating all backends to the new _scal_ optab (and > removing the vector optab). Certainly I'd like to replace vec_shr/l with > vec_perm_expr too, but I'm conscious that the end of stage 1 is approaching! I suppose we all are. It will last until end of October at least (stage1 of gcc 4.9 ended Nov 22th, certainly a bit late). I do expect we will continue merging already developed / posted stuff through stage3 (as usual). That said, it would be really nice to get rid of VEC_RSHIFT_EXPR. Thanks, Richard. > --Alan > > > > > Richard Biener wrote: >> >> On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence <alan.lawre...@arm.com> >> wrote: >>> >>> The end goal here is to remove this code from tree-vect-loop.c >>> (vect_create_epilog_for_reduction): >>> >>> if (BYTES_BIG_ENDIAN) >>> bitpos = size_binop (MULT_EXPR, >>> bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) >>> - >>> 1), >>> TYPE_SIZE (scalar_type)); >>> else >>> >>> as this is the root cause of PR/61114 (see testcase there, failing on all >>> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard >>> Biener, >>> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is >>> suspicious". The code snippet above is used on two paths: >>> >>> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR = >>> reduc_[us](plus|min|max)_optab. >>> The optab is documented as "the scalar result is stored in the least >>> significant bits of operand 0", but the tree code as "the first element >>> in >>> the vector holding the result of the reduction of all elements of the >>> operand". This mismatch means that when the tree code is folded, the code >>> snippet above reads the result from the wrong end of the vector. >>> >>> The strategy (as per >>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define >>> new >>> tree codes and optabs that produce scalar results directly; this seems >>> better than tying (the element of the vector into which the result is >>> placed) to (the endianness of the target), and avoids generating extra >>> moves >>> on current bigendian targets. However, the previous optabs are retained >>> for >>> now as a migration strategy so as not to break existing backends; moving >>> individual platforms over will follow. >>> >>> A complication here is on AArch64, where we directly generate >>> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily >>> remove this folding in order to decouple the midend and AArch64 backend. >> >> >> Sounds fine. I hope we can transition all backends for 5.0 and remove >> the vector variant optabs (maybe renaming the scalar ones). >> >>> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e. >>> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the >>> optab >>> is defined in an endianness-dependent way, leading to significant >>> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab >>> is >>> never used!). Few platforms appear to handle vec_shr_optab (and fewer >>> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change >>> the existing optab to be endianness-neutral. >>> >>> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13 >>> updates that implementation to fit the new endianness-neutral >>> specification, >>> serving as a guide for other existing backends. Patches/RFCs 15 and 16 >>> are >>> equivalents for MIPS and PowerPC; I haven't tested these but hope they >>> act >>> as useful pointers for the port maintainers. >>> >>> Finally patch 14 cleans up the affected part of tree-vect-loop.c >>> (vect_create_epilog_for_reduction). >> >> >> As said during the individual patches review I'd like the vectorizer to >> use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with >> only whole-element amounts). This means we can remove >> VEC_RSHIFT_EXPR. It also means that if the backend defines >> vec_perm_const (which it really should) it can handle the special >> permutes that boil down to a possibly more efficient vector shift >> there (a good optimization anyway). Until it does that all backends >> would at least create correct code (with the endian dependent >> vec_shr removed). >> >> Richard. >> >>> --Alan >>> >> > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2557590 > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2548782 >