Thanks for the quick feedback! Richard Biener <richard.guent...@gmail.com> writes: > On Wed, Nov 29, 2017 at 12:57 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> It was clear from the SVE reviews that people were unhappy with how >> "special" the variable-length case was. One particular concern was >> the use of VEC_DUPLICATE_CST and VEC_SERIES_CST, and the way that >> that would in turn lead to different representations of VEC_PERM_EXPRs >> with constant permute vectors, and difficulties in invoking >> vec_perm_const_ok. >> >> This is an RFC for a VECTOR_CST representation that treats each >> specific constant as one instance of an arbitrary-length sequence. >> The reprensentation then extends to variable-length vectors in a >> natural way. >> >> As discussed on IRC, if a vector contains X*N elements for some >> constant N and integer X>0, the main features we need are: >> >> 1) the ability to represent a sequence that duplicates N values >> >> This is useful for SLP invariants. >> >> 2) the ability to represent a sequence that starts with N values and >> is followed by zeros >> >> This is useful for the initial value in a double or SLP reduction >> >> 3) the ability to represent N interleaved series >> >> This is useful for SLP inductions and for VEC_PERM_EXPRs. >> >> For (2), zero isn't necessarily special, since vectors used in an AND >> reduction might need to fill with ones. Also, we might need up to N >> different fill values with mixed SLP operations; it isn't necessarily >> safe to assume that a single fill value will always be enough. >> >> The same goes for (3): there's no reason in principle why the >> steps in an SLP induction should all be the same (although they >> do need to be at the moment). E.g. once we support SLP on: >> >> for (unsigned int i = 0; i < n; i += 2) >> { >> x[i] += 4 + i; >> x[i + 1] += 11 + i * 3; >> } >> >> we'll need {[4, 14], +, [2, 6]}. >> >> So the idea is to represent vectors as P interleaved patterns of the form: >> >> [BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, ...] >> >> where the STEP is always zero (actually null) for non-integer vectors. >> This is effectively projecting a "foreground" value of P elements >> onto an arbitrary-length "background" sequenece, where the background >> sequence contains P parallel linear series. >> >> E.g. to pick an extreme and unlikely example, >> >> [42, 99, 2, 20, 3, 30, 4, 40, ...] >> >> has 2 patterns: >> >> BASE0 = 42, BASE1 = 2, STEP = 1 >> BASE0 = 99, BASE1 = 20, STEP = 10 >> >> The more useful cases are degenerate versions of this general case. >> >> As far as memory consumption goes: the number of patterns needed for a >> fixed-length vector with 2*N elements is always at most N; in the worst >> case, we simply interleave the first N elements with the second N elements. >> The worst-case increase in footprint is therefore N trees for the steps. >> In practice the footprint is usually smaller than it was before, since >> most constants do have a pattern. >> >> The patch below implements this for trees. I have patches to use the >> same style of encoding for CONST_VECTOR and vec_perm_indices, but the >> tree one is probably easiest to read. >> >> The patch only adds the representation. Follow-on patches make more >> use of it (and usually make things simpler; e.g. integer_zerop is no >> longer a looping operation). >> >> Does this look better? > > Yes, the overall design looks good. I wonder why you chose to have > the number of patterns being a power of two? I suppose this is > to have the same number of elements from all patterns in the final > vector (which is power-of-two sized)?
Right. The rtl and vec_perm_indices parts don't have this restriction, since some ports do define non-power-of-2 vectors for internal use. The problem is that, since VECTOR_CSTs are used by the FE, we need to support all valid vector lengths without blowing the 16-bit field. Using the same style of representation as TYPE_VECTOR_SUBPARTS seemed like the safest way of doing that. > I wonder if there exists a vector where say a three-pattern > interleaving would be smaller than a four-pattern one? Only in the non-power-of-2 case. > Given you add flags for various purposes would it make sense to > overload 'step' with a regular element to avoid the storage increase > in case step is unnecessary? This makes it have three elements > which is of course awkward :/ I wondered about keeping it as an array of trees and tacking the steps onto the end as an optional addition. But the idea is that tree_vector_pattern becomes the preferred way of handling constant vectors, if it can be used, so it seemed neater to use in the tree node too. > Few more comments below. > > Otherwise looks good to go! > > Thanks for doing this. > > [...] >> ! /* PATTERNS represents a constant of type TYPE. Compress it into the >> ! canonical form, returning the new vector of patterns. Use CUR and NEXT >> ! as temporary workspace. */ >> ! >> ! static vec<tree_vector_pattern> >> ! compress_vector_patterns (vec<tree_vector_pattern> *cur, >> ! vec<tree_vector_pattern> *next, >> ! tree type, vec<tree_vector_pattern> patterns) >> ! { >> ! while (patterns.length () > 1) >> ! { >> ! unsigned int npatterns = patterns.length (); >> ! gcc_assert ((npatterns & 1) == 0); >> ! unsigned int step = npatterns / 2; >> ! cur->truncate (0); >> ! cur->reserve (step); >> ! bool first_p = npatterns * 2 == TYPE_VECTOR_SUBPARTS (type); >> ! for (unsigned int i = 0; i < step; ++i) >> ! if (!combine_patterns (cur, &patterns[i], &patterns[i + step], >> ! first_p)) >> ! return patterns; >> ! patterns = *cur; >> ! std::swap (cur, next); >> ! } >> ! return patterns; >> ! } > > Wow, that looks complicated ;) Which means it needs some more comments. OK :-) > I wonder if avoiding all this for the special cases we like to handle by > providing vector_cst constructors for one and two pattern cases would > be best. Or even more specialized, not exposing "patterns". We still have build_vector_from_val, and there'll be one for building normal series. But code working on general vectors should usually operate on patterns where possible, so that it extends to variable-length vectors. Such code can't make any assumptions about the shape of an existing vector. We also need the complication above for building vectors from vec<tree>s, so that there's only one canonical representation of a given vector, however it was built. >> ! /* Return true if PATTERNS has an overflow bit set. */ >> ! >> ! static bool >> ! vector_overflow_p (vec<tree_vector_pattern> patterns) >> ! { >> ! unsigned int i; >> ! tree_vector_pattern *pattern; >> ! FOR_EACH_VEC_ELT (patterns, i, pattern) >> ! for (unsigned int j = 0; j < 2; ++j) >> ! if (CONSTANT_CLASS_P (pattern->base[j]) >> ! && TREE_OVERFLOW (pattern->base[j])) >> ! return true; >> ! return false; >> ! } >> ! >> ! /* Return true if PATTERNS represents a duplicate operation. */ >> >> ! static bool >> ! vector_duplicate_p (vec<tree_vector_pattern> patterns) >> ! { >> ! unsigned int i; >> ! tree_vector_pattern *pattern; >> ! FOR_EACH_VEC_ELT (patterns, i, pattern) >> ! { >> ! if (pattern->step && wi::to_wide (pattern->step) != 0) > > I wonder if step == 0 should be canonicalized to NULL_TREE? Kept wondering about that, but couldn't make my mind up which way was better. I'll see how it looks with that change. >> ! return false; >> ! if (!operand_equal_p (pattern->base[0], pattern->base[1], 0)) >> ! return false; >> } >> + return true; >> + } >> + >> + /* Return true if PATTERNS represents a vector series. */ >> + >> + static bool >> + vector_series_p (vec<tree_vector_pattern> patterns) >> + { >> + unsigned int i; >> + tree_vector_pattern *pattern; >> + FOR_EACH_VEC_ELT (patterns, i, pattern) >> + if (!pattern->step >> + || (wi::to_wide (pattern->base[1]) - wi::to_wide (pattern->base[0]) >> + != wi::to_wide (pattern->step))) >> + return false; >> + return true; >> + } >> + >> + /* Build a VECTOR_CST of type TYPE using the patterns in PATTERNS, >> + canonicalizing it first if necessary. */ >> + >> + tree >> + build_vector (tree type, vec<tree_vector_pattern> patterns MEM_STAT_DECL) >> + { >> + auto_vec<tree_vector_pattern, 16> tmp1, tmp2; >> + patterns = compress_vector_patterns (&tmp1, &tmp2, type, patterns); >> + unsigned int npatterns = patterns.length (); >> + >> + gcc_assert (pow2p_hwi (npatterns)); >> + bool overflow_p = vector_overflow_p (patterns); >> + bool duplicate_p = vector_duplicate_p (patterns); >> + bool series_p = vector_series_p (patterns); >> + tree v = make_vector (exact_log2 (npatterns), duplicate_p, series_p); >> + TREE_TYPE (v) = type; >> + TREE_OVERFLOW (v) = overflow_p; >> >> ! memcpy (v->vector.patterns, patterns.address (), >> ! npatterns * sizeof (tree_vector_pattern)); >> return v; >> } >> >> /* Return a new VECTOR_CST node whose type is TYPE and whose values >> + are given by ELTS. */ >> + >> + tree >> + build_vector (tree type, vec<tree> elts MEM_STAT_DECL) >> + { >> + unsigned int nelts = elts.length (); >> + gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); >> + >> + gcc_assert (pow2p_hwi (nelts)); >> + bool single_p = nelts == 1; >> + unsigned int npatterns = single_p ? 1 : nelts / 2; >> + auto_vec<tree_vector_pattern, 16> patterns (npatterns); >> + for (unsigned int i = 0; i < npatterns; ++i) >> + { >> + tree_vector_pattern pattern; >> + pattern.base[0] = elts[i]; >> + pattern.base[1] = elts[single_p ? i : i + npatterns]; >> + if (INTEGRAL_TYPE_P (TREE_TYPE (type)) >> + && TREE_CODE (pattern.base[0]) == INTEGER_CST >> + && TREE_CODE (pattern.base[1]) == INTEGER_CST) >> + pattern.step = wide_int_to_tree (TREE_TYPE (type), >> + wi::to_wide (pattern.base[1]) >> + - wi::to_wide (pattern.base[0])); >> + else >> + pattern.step = NULL_TREE; > > I wonder why given the simple representation with n/2 patterns > you don't use step == NULL_TREE unconditionally here. The idea is to make constants with exactly 2*npatterns elements less special. When a constant ends after exactly 2 patterns, we could choose to extend it with any step. But since the most common use of a series is to have a+b*x (with no special case for the first element), using the difference seemed like the best default choice. E.g. if we want to ask "is element x equal to 0+x*2?", the answer will be the same for {0, 2} and {0, 2, 4, 6}. Of course, this means that "is this equal to { 1, 0, 0, ... }?" does need to treat the 2-pattern case as special. So yeah, using a null step and always treating the 2-pattern case as special would be another alternative. > Is there a target macro specifying the maximum target supported > vector length? It might be nice to use that in the auto_vec <> size. Not that I know of. But this code is laso used for generic vectors that don't have target support, so it might make sense to have a target-independent choice anyway. I can use a typedef to avoid hard-coding the number everywhere. > OTOH sometimes I really like > > tree_vector_pattern *p = XALLOCAVEC (npatterns); > > more... (ok, __attribute__((vector_size(1048576))) might blow the stack...) Yeah :-) Plus it gets awkward in loops. Thanks, Richard