On Fri, Nov 11, 2016 at 6:50 PM, Richard Sandiford <richard.sandif...@arm.com> wrote: > As described in the covering note, one of big differences for SVE is that > things like mode sizes, offsets, and numbers of vector elements can depend > on a runtime parameter. This message describes how the SVE patches handle > that and how they deal with vector constants in which the number of elements > isn't fixed at compile time. > > > Mode sizes and numbers of elements > ================================== > > Having runtime mode sizes and numbers of elements means for example that: > > GET_MODE_SIZE > GET_MODE_BITSIZE > GET_MODE_PRECISION > GET_MODE_NUNITS > TYPE_VECTOR_SUBPARTS > > are now runtime invariants rather than compile-time constants. The first > question is what the representation of these runtime invariants should be. > Two obvious choices are: > > (1) Make them tree or rtl expressions (as appropriate for the IR > they're part of). > (2) Use a new representation. > > One of the main problems with (1) is that it's much more general than > we need. If we made something like GET_MODE_SIZE an rtx, it would be > hard to enforce statically that the value has a suitable form. It would > also slow down the compiler, including for targets that don't need runtime > sizes. > > We therefore went for approach (2). The idea is to add a new > "polynomial integer" (poly_int) class that represents a general: > > C0 + C1 * X1 + ... + Cn * Xn > > where each coefficient Ci is a compile-time constant and where each > indeterminate Xi is a nonnegative runtime parameter. The class takes > "n" and the coefficient type as template parameters, so unlike (1) it > can continue to occupy less memory than a pointer where appropriate. > > The value of "n" for mode sizes and offsets depends on the target. > For all targets except AArch64, "n" is 1 and the class degenerates > to a constant. > > One difficulty with using runtime sizes is that some common questions > might not be decidable at compile time. E.g. if mode A has size 2 + 2X > and mode B has size 4, the condition: > > GET_MODE_SIZE (A) <= GET_MODE_SIZE (B) > > is true for X<=1 and false for X>=2. It's therefore no longer possible > for target-independent code to use these kinds of comparison for modes > that might be vectors. Instead it needs to ask "might the size be <=?" > or "must the size be <=?". > > If a target only has constant sizes, it would be silly for target-specific > code to have to make the distinction between "may" and "must", since the > target knows that they amount to the same thing. poly_int therefore > provides an implicit conversion to a constant if "n" is 1 and if we're > compiling target-specific code. Whether this conversion is available > is controlled by a new TARGET_C_FILE macro. > > The idea is to allow current targets to compile as-is with very few > changes while at the same time ensuring that people working on target- > independent code can be reasonably confident of "doing the right thing" > for runtime sizes without having to test SVE specifically. > > However, even with SVE, all non-vector modes still have a compile-time size. > In these cases we had two options: use may/must operations anyway, or add > static type checking to enforce the fact that the mode isn't a vector. > The latter seemed better in most cases. The patches therefore add the > following classes to wrap a machine mode enum: > > scalar_int_mode: modes that satisfy SCALAR_INT_MODE_P > scalar_float_mode: modes that satisfy SCALAR_FLOAT_MODE_P > scalar_mode: modes that hold some kind of scalar > complex_mode: modes that hold a complex value > > These wrappers have other benefits too. They replace some runtime asserts > with static type checking and also make sure that the size or precision of > a vector mode isn't accidentally used instead of the size of precision of > an element. (This sometimes happened when handling vector shifts by a > scalar amount, for example.) > > We reused the is_a<>, as_a<> and dyn_cast<> operators for machine modes. > E.g.: > > is_a <scalar_mode> (M) > > tests whether M is scalar and: > > as_a <scalar_int_mode> (M) > > forcibly converts M to a scalar_int_mode, asserting if it isn't one. > We also used: > > is_a <scalar_int_mode> (M, &RES) > > as a convenient way of testing whether M is a scalar_int_mode and > storing it as one in RES if so. This helps with various multi-line > "if" statements, particularly in simplification routines. > > For consistency, the patches make machine_mode itself a wrapper class > and rename the enum to machine_mode_enum. FOOmode identifiers have > the most specific type appropriate to them, so for example DImode is a > scalar_int_mode and DFmode is a scalar_float_mode. The raw enum values > are still available with the E_ prefix (e.g. E_DImode) and are useful > for things like case statements. > > I've attached the implementation of poly_int. It contains a big block > comment at the start describing the approach and summarising the > available operations. > > I've also attached the new version of machmode.h, with the wrapper > classes described above. > > One thing we haven't done but should is add self-tests for the > poly_int class. A lot of this code was written before self tests > were available, but one of the reasons for making "n" a template > parameter was precisely to allow n==2 to be tested on targets that > don't need runtime parameters. > > Note that many things besides the macros above need to become polynomial. > Other examples include SUBREG_BYTE, frame offsets, frame sizes, and the > values returned by get_inner_reference. > > > Representing runtime parameters in the IR > ========================================= > > Even though we used polynomials rather than IR to encode things like > mode sizes, we still need a way of representing the runtime parameters > in IR. This is used when incrementing vector ivs and allocating stack > frames, for example. > > There were two ways we considered doing this in rtl: > > (1) Add a new rtl code for the poly_ints themselves. This would give > constants like: > > (const_poly_int [(const_int C0) > (const_int C1) > ... > (const_int Cn)]) > > (although the coefficients could be const_wide_ints instead > of const_ints where appropriate). The runtime value would be: > > C0 + C1 * X1 + ... + Cn * Xn > > (2) Add a new rtl code for the polynomial indeterminates Xi, > then use them in const wrappers. A constant like C0 + C1 * X1 > would then look like: > > (const:M (plus:M (mult:M (const_param:M X1) > (const_int C1)) > (const_int C0))) > > There didn't seem to be that much to choose between them. However, > DWARF location expressions that depend on the SVE vector length use > a pseudo register to encode that length. This is very similar to the > const_param used in expression (2), and the DWARF expression would use > similar arithmetic operations to construct the full polynomial constant. > We therefore went for (2). > > Most uses of rtx polynomial constants use helper functions that abstract > the underlying representation, so it would be easy to change to (1) (or > to a third approach) in future. > > Unlike rtl, trees have no established practice of wrapping arbitrary > arithmetic in a const-like wrapper, so (1) seemed like the best approach. > The patches therefore add a new POLY_CST node that holds one INTEGER_CST > per coefficient. Again, the actual representation is usually hidden > behind accessor functions; very little code operates on POLY_CSTs directly. > > > Constructing variable-length vectors > ==================================== > > Currently both tree and rtl vector constants require the number of > elements to be known at compile time and allow the elements to be > arbitrarily different from one another. SVE vector constants instead > have a variable number of elements and require the constant to have > some inherent structure, so that the values remain predictable however > long the vector is. In practice there are two useful types of constant: > > (a) a duplicate of a single value to all elements. > > (b) a linear series in which element E has the value BASE + E * STEP, > for some given BASE and STEP. > > For integers, (a) could simply be seen as a special form of (b) in > which the step is zero. However, we've deliberately not defined (b) > for floating-point modes, in order to avoid specifying whether element > E should really be calculcated as BASE + E * STEP or as BASE with STEP > added E times (which would round differently). So treating (a) as a > separate kind of constant from (b) is useful for floating-point types. > > We need to support the same operations for non-constant vectors as well > as constant ones. Both operations have direct analogues in SVE. > > rtl already supports (a) for variables via vec_duplicate. For constants > we simply wrapped such vec_duplicates in a (const ...), so for example: > > (const:VnnHI (vec_duplicate:VnnHI (const_int 10))) > > represents a vector constant in which each element is the 16-bit value 10. > > For (b) we created a new vec_series rtl code that takes the base and step > as operands. A vec_series is constant if it has constant operands, in which > case it too can be wrapped in a (const ...). For example: > > (const:VnnSI (vec_series:VnnSI (const_int 1) (const_int 3))) > > represents the constant { 1, 4, 7, 10, ... }. > > We only use constant vec_duplicate and vec_series when the number of > elements is variable. Vectors with a constant number of elements > continue to use const_vector. It might be worth considering using > vec_duplicate across the board in future though, since it's significantly > more compact when the number of elements is large. > > In both vec_duplicate and vec_series constants, the value of the element > can be any constant that is valid for the element's mode; it doesn't have > to be a const_int. > > The patches take a similar approach for trees. A new VEC_DUPLICATE_EXPR > returns a vector in which every element is equal to operand 0, while a new > VEC_SERIES_EXPR creates a linear series, taking the same two operands as the > rtl code. The trees are TREE_CONSTANT if the operands are TREE_CONSTANT. > > The new trees are valid gimple values iff they are TREE_CONSTANT. > This means that the constant forms can be used in a very similar way > to VECTOR_CST, rather than always requiring a separate gimple assignment.
Hmm. They are hopefully (at least VEC_DUPLICATE_EXPR) not GIMPLE_SINGLE_RHS. But it means they'd appear (when TREE_CONSTANT) as gimple operand in GENERIC form. > Variable-length permutes > ======================== > > SVE has a similar set of permute instructions to Advanced SIMD: it has > a TBL instruction for general permutes and specific instructions like > TRN1 for certain common operations. Although it would be possible to > construct variable-length masks for all the special-purpose permutes, > the expression to construct things like "interleave high" would be > relatively complex. It seemed better to add optabs and internal > functions for certain kinds of well-known operation, specifically: > > - IFN_VEC_INTERLEAVE_HI > - IFN_VEC_INTERLEAVE_LO > - IFN_VEC_EXTRACT_EVEN > - IFN_VEC_EXTRACT_ODD > - IFN_VEC_REVERSE It's a step backwards from a unified representation of permutes in GIMPLE. I think it would be better to have the internal functions generate the well-known permute mask instead. Thus you'd have mask = IFN_VEC_INTERLEAVE_HI_MASK (); vec = VEC_PERM_EXPR <vec1, vec2, mask>; extract_even/odd should be doable with VEC_SERIES_EXPR, so is VEC_REVERSE. interleave could use a double-size element mode to use VEC_SERIES_EXPR with 0004 + n * 0101 to get 0004, 0105, 0206, 0307 for a 4 element vector for example. And then view-convert to the original size element mode to the at the mask. I really wonder how you handle arbitrary permutes generated by SLP loop vectorization ;) (well, I guess "not supported" for the moment). In general, did you do any compile-time / memory-use benchmarking of the middle-end changes for a) targets not using variable-size modes, b) a target having them, with/without the patches? How is debugging experience (of GCC itself) for targets without variable-size modes when dealing with RTL? A question on SVE itself -- is the vector size "fixed" in hardware or can it change say, per process? [just thinking of SMT and partitioning of the vector resources] Given that for HPC everybody recompiles their code for a specific machine I'd have expected a -mvector-size=N switch to be a more pragmatic approach for GCC 7 and also one that (if the size is really "fixed" in HW) might result in better code generation (at least initially). Thanks, Richard. > These names follow existing target-independent terminology rather than > the usual AArch64 scheme. > > Thanks, > Richard >