On 09/11/2018 09:28 AM, Palmer Dabbelt wrote: >> The RISC-V vector extension described something other than what is >> present in the currently released 2.2 standard. To clarify the >> language within this message, based on what I remember: > > Yes. The current RISC-V ISA standard contains no vector instructions, they > will be added under the "V" extension as part of a future revision of the > RISC-V standard. This is how we manage the standard: as new revisions of the > ISA manual come out we can add new extensions, but we can never change or > remove an existing extension.
Well, right, but it does have a draft of the V extension. What was presented did not match that, which is what I was trying to describe. >> We posited new instructions, vspill and vfill, that ignore VL, ignore >> predication, and operate on all MAXVL elements of MAXEL. This allows >> the compiler to save and restore the entire contents of the register >> without knowing the current configuration. > > While I'm not part of the vector working group, I'd anticipate these sorts of > instructions don't make it into the V extension because they leak too much > about the microarchitecture to software. One of the goals of the V extension > is to allow for software compatibility between different implementations, and > instructions with semantics like these tend to lead to incompatible software. Pardon? How do they leak micro-architecture detail? They load and store the *architectural* contents of the registers. > Additionally, I don't think this is necessary because our proposed vector ABI > is to clobber the entire state of the vector unit on all function calls. Yes, but I was foreshadowing... >> (II) We talked about the needs of a "simd" abi ... this, in which we would not necessarily know the vconfig. > Must is a strong word, but I agree that we should at least ensure that it's > possible to define a sane ABI that saves vector registers around function > calls > and passes arguments via vector registers. In other words: I think we'll > still > want to support something like "-march=rv64gcv -mabi=lp64d", but I don't think > we want to preclude ourselves from "-march=rv64gcv -mabi=lp64dv" being better. > > I think the best way to go about this is to figure out what features of an ABI > might be worth having, and then to enumerate the mechanisms that an V-style > ISA > extension must provide in order to sanely implement such an ABI. Essentially > we've still got time to change the ISA, so let's just design a good ABI, > figure > out what's necessary from the ISA to implement said ABI, and then make sure > that's in the standard. Sure. > > The ABI features I can think of are: > > * Passing at least one argument in a vector register. > - Presumably we'll clobber vector argument registers on calls, like we do > for everything else. Thus there isn't any ISA requirement here. > - How does one go about indicating at the C level that an argument is > passed in a register? If we just say "any __attribute__((vector)) of > length less than N bytes/elements" then N must be less than the ISA > mandated minimum vector length (IIRC 4 elements?) -- that might be OK. Here I think you need to read the SVE document. I would not use this abi for __attribute__((vector(fixed-size))) at all, but for the variable length vectors that the auto-vectorizer uses, since that's exactly what these functions are for. > * Saving the contents of at least one vector register across a function call. > In order to do so we need: > - A mechanism for determining the number of bytes used by a vector > register, to reserve stack space. > - A mechanism for saving a vector register to the stack. This could be a > simple vector store, but if we want to maintain the entire register (as > opposed to just the first vl elements) we need This is exactly what I was talking about above for vspill/vfill. > * Saving vl across a function call. > - We need a mechanism for determining the vector length. Currently the > > only way to do so is destructive, we'd need a non-destructive way to do > so. > * Saving vconfig across a function call. > - There is no way to determine the config, we'd need a way to do so. Correct. I will note that the above addvsz can be used as "addvsz tmp, x0, 1" to extract VSZ. I can't think of how often extracting MAXEL and MAXVL individually would be useful, so maybe just being able to get them from a read-vconfig insn would be enough. > My proposed vector ABI is: > > * Don't pass any vector arguments in registers. If you're going to do that why define a new ABI at all? >> (II-a) The callee must know how many registers are enabled by vconfig. >> >> The simplest solution is simply to require all 32 registers to be enabled. >> >> Expanding on this slightly, one could require a reduced set N (e.g. 16) >> and defined this as abi. This would trade off potentially unused >> registers and potentially more spilling for longer vectors in the >> (presumably) common case. >> >> One could require N registers by default and override this by an >> explicit target-specific clause in the #pragma. This would allow >> programmers to tune the compiler output (bearing in mind that changing >> the clause changes the function abi), while also providing a sensible >> default for code that has not been explicitly tuned for a given risc-v >> implementation. > > Makes sense -- my only worry here is that we're leaving a lot on the floor. > Maybe this is just because I'm not really a vector guy, but my biggest worry > with the vector unit is ensuring that memcpy() and friends are reasonably > efficient. For memcpy, that's always going to be a normal abi, so it can legitimately clobber all of the vector registers in any way it likes -- e.g. reconfig to maximize byte vector length. > I'm a bit worried about throwing a factor of 32 in vector length on > the floor here (or requiring saving a huge vector state), Jakub talked a bit about this in his reply. > particularly as I > think that most vectorized code won't need to worry about calling standard ABI > functions. Well, yes, most things that we can vectorize don't need this. But loops that would use this ABI would otherwise be non-vectorizable. r~