On Fri, Aug 1, 2014 at 12:19 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: > This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes. > > These are presently documented as producing a vector with the result in > element 0, and this is inconsistent with their use in tree-vect-loop.c > (which on bigendian targets pulls the bits out of the other end of the > vector result). This leads to bugs on bigendian targets - see > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 for a small testcase. > > I discounted "fixing" the vectorizer (to always read from element 0) and > then making the optab reverse the result on bigendian targets (whose > architectural insn produces the result in lane N-1), as optimization of > vectors in RTL seems unlikely to remove such a reverse/permute and so this > would lead to a performance regression (specifically on PowerPC). > > Instead it seems more natural for the tree code to produce a scalar result > (producing a vector with the result in lane 0 has already caused confusion, > e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html). > > This patch preserves the meaning of the existing optab (producing a result > in lane 0 on little-endian architectures or N-1 on bigendian), thus > generally avoiding the need to change backends. Hence, expr.c extracts an > endianness-dependent element from the optab result to give the result > expected for the tree code. > > Significant complication in the AArch64 backend stems from the existence of > builtins for reduction operations, which are gimple_fold'd to the tree code. > Hence, I introduce new define_expands, and map the existing > __builtin_aarch64_reduc_s{plus,min,max}_<mode> functions to those, with > scalar result types, matching the result of the tree code to which these are > still gimple_folded. > > If the above/proposed solution is acceptable, I'd make a longer patch > series, including some cleanup to tree-vect-loop.c > (vect_create_epilog_for_reduction now has only one case where > extract_scalar_result == true), and separating out AArch64 changes. Further, > I'd like to propose creating a new optab that directly outputs a scalar, as > a migration path away from the existing optab whose meaning is > endianness-dependent, i.e. such that expand_unop falls back to the existing > optab only if the new one is not defined. > > Patch as it stands has been bootstrapped on x86_64 and regression tested on > aarch64 and aarch64_be without regressions. On x86_64 there is a regression > in gcc.target/i386/pr51235.c, where it seems my check in tree-cfg.c is too > strict - we end up with a reduction from "vector (4) unsigned long int" to > "void *". (Even if I modify tree-vect-loop.c to build the REDUC_..._EXPR as > returning the element type of the input vector, its return type is later > changed.) It seems I can "get away with" a less-strict check in tree-cfg.c, > i.e. allowing the case where the modes of the expected and actual result > types match (rather than "useless_type_conversion_p" holding between said > types), but if anyone can suggest an alternative/better check then it'd be > great to hear it...
We should fix the vectorizer code-generation instead. Makes sense to me - non-aarch64 parts of the patch are ok. The optab migration strategy is as well. Thanks, Richard. > --Alan