Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Oct 24, 2017 at 11:40 AM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is >>>> encoded in the 10-bit precision field and was previously always stored >>>> as a simple log2 value. The challenge was to use this 10 bits to >>>> encode the number of elements in variable-length vectors, so that >>>> we didn't need to increase the size of the tree. >>>> >>>> In practice the number of vector elements should always have the form >>>> N + N * X (where X is the runtime value), and as for constant-length >>>> vectors, N must be a power of 2 (even though X itself might not be). >>>> The patch therefore uses the low bit to select between constant-length >>>> and variable-length and uses the upper 9 bits to encode log2(N). >>>> Targets without variable-length vectors continue to use the old scheme. >>>> >>>> A new valid_vector_subparts_p function tests whether a given number >>>> of elements can be encoded. This is false for the vector modes that >>>> represent an LD3 or ST3 vector triple (which we want to treat as arrays >>>> of vectors rather than single vectors). >>>> >>>> Most of the patch is mechanical; previous patches handled the changes >>>> that weren't entirely straightforward. >>> >>> One comment, w/o actually reviewing may/must stuff (will comment on that >>> elsewhere). >>> >>> You split 10 bits into 9 and 1, wouldn't it be more efficient to use the >>> lower 8 bits for the log2 value of N and either of the two remaining bits >>> for the flag? That way the 8 bits for the shift amount can be eventually >>> accessed in a more efficient way. >>> >>> Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS >>> accessor on aarch64 / x86_64. >> >> Ah, yeah. I'll give that a go. >> >>> Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not >>> have variable length vector modes? >> >> Right. 1 is the default and only AArch64 defines it to anything else (2). > > Going to be interesting (bitrot) times then? I wonder if it makes sense > to initially define it to 2 globally and only change it to 1 later?
Well, the target-independent code doesn't have the implicit conversion from poly_int<1, C> to C, so it can't e.g. do: poly_int64 x = ...; HOST_WIDE_INT y = x; even when NUM_POLY_INT_COEFFS==1. Only target-specific code (identified by IN_TARGET_CODE) can do that. So to target-independent code it doesn't really matter what NUM_POLY_INT_COEFFS is. Even if we bumped it to 2, the extra coefficient would always be zero. FWIW, the poly_int tests in [001/nnn] cover N == 1, 2 and (as far as supported) 3 for all targets, so that part isn't sensitive to NUM_POLY_INT_COEFFS. > Do you have any numbers on the effect of poly-int on compile-times? > Esp. for example on stage2 build times when stage1 is -O0 -g "optimized"? I've just tried that for an x86_64 -j24 build and got: real: +7% user: +8.6% I don't know how noisy the results are though. It's compile-time neutral in terms of running a gcc built with --enable-checking=release, within a margin of about [-0.1%, 0.1%]. Thanks, Richard