On Tue, Oct 24, 2017 at 1:18 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Tue, Oct 24, 2017 at 11:40 AM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> Richard Biener <richard.guent...@gmail.com> writes: >>>> On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford >>>> <richard.sandif...@linaro.org> wrote: >>>>> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is >>>>> encoded in the 10-bit precision field and was previously always stored >>>>> as a simple log2 value. The challenge was to use this 10 bits to >>>>> encode the number of elements in variable-length vectors, so that >>>>> we didn't need to increase the size of the tree. >>>>> >>>>> In practice the number of vector elements should always have the form >>>>> N + N * X (where X is the runtime value), and as for constant-length >>>>> vectors, N must be a power of 2 (even though X itself might not be). >>>>> The patch therefore uses the low bit to select between constant-length >>>>> and variable-length and uses the upper 9 bits to encode log2(N). >>>>> Targets without variable-length vectors continue to use the old scheme. >>>>> >>>>> A new valid_vector_subparts_p function tests whether a given number >>>>> of elements can be encoded. This is false for the vector modes that >>>>> represent an LD3 or ST3 vector triple (which we want to treat as arrays >>>>> of vectors rather than single vectors). >>>>> >>>>> Most of the patch is mechanical; previous patches handled the changes >>>>> that weren't entirely straightforward. >>>> >>>> One comment, w/o actually reviewing may/must stuff (will comment on that >>>> elsewhere). >>>> >>>> You split 10 bits into 9 and 1, wouldn't it be more efficient to use the >>>> lower 8 bits for the log2 value of N and either of the two remaining bits >>>> for the flag? That way the 8 bits for the shift amount can be eventually >>>> accessed in a more efficient way. >>>> >>>> Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS >>>> accessor on aarch64 / x86_64. >>> >>> Ah, yeah. I'll give that a go. >>> >>>> Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not >>>> have variable length vector modes? >>> >>> Right. 1 is the default and only AArch64 defines it to anything else (2). >> >> Going to be interesting (bitrot) times then? I wonder if it makes sense >> to initially define it to 2 globally and only change it to 1 later? > > Well, the target-independent code doesn't have the implicit conversion > from poly_int<1, C> to C, so it can't e.g. do: > > poly_int64 x = ...; > HOST_WIDE_INT y = x; > > even when NUM_POLY_INT_COEFFS==1. Only target-specific code (identified > by IN_TARGET_CODE) can do that. > > So to target-independent code it doesn't really matter what > NUM_POLY_INT_COEFFS is. Even if we bumped it to 2, the extra coefficient > would always be zero. > > FWIW, the poly_int tests in [001/nnn] cover N == 1, 2 and (as far as > supported) 3 for all targets, so that part isn't sensitive to > NUM_POLY_INT_COEFFS. > >> Do you have any numbers on the effect of poly-int on compile-times? >> Esp. for example on stage2 build times when stage1 is -O0 -g "optimized"? > > I've just tried that for an x86_64 -j24 build and got: > > real: +7% > user: +8.6% > > I don't know how noisy the results are though.
What's the same on AARCH64 where NUM_POLY_INT_COEFFS is 2? > It's compile-time neutral in terms of running a gcc built with > --enable-checking=release, within a margin of about [-0.1%, 0.1%]. I would have expected that (on x86_64). Well, hoped (you basically stated that in 000/nnn. The question is what is the effect on AARCH64. As you know we build openSUSE for AARCH64 and build power is limited ;) Richard. > Thanks, > Richard