On 09/11/2018 09:28 AM, Palmer Dabbelt wrote:
>> The RISC-V vector extension described something other than what is
>> present in the currently released 2.2 standard.  To clarify the
>> language within this message, based on what I remember:
> 
> Yes.  The current RISC-V ISA standard contains no vector instructions, they
> will be added under the "V" extension as part of a future revision of the
> RISC-V standard.  This is how we manage the standard: as new revisions of the
> ISA manual come out we can add new extensions, but we can never change or
> remove an existing extension.

Well, right, but it does have a draft of the V extension.
What was presented did not match that, which is what I was trying to describe.

>> We posited new instructions, vspill and vfill, that ignore VL, ignore
>> predication, and operate on all MAXVL elements of MAXEL.  This allows
>> the compiler to save and restore the entire contents of the register
>> without knowing the current configuration.
> 
> While I'm not part of the vector working group, I'd anticipate these sorts of
> instructions don't make it into the V extension because they leak too much
> about the microarchitecture to software.  One of the goals of the V extension
> is to allow for software compatibility between different implementations, and
> instructions with semantics like these tend to lead to incompatible software.

Pardon?  How do they leak micro-architecture detail?
They load and store the *architectural* contents of the registers.


> Additionally, I don't think this is necessary because our proposed vector ABI
> is to clobber the entire state of the vector unit on all function calls.

Yes, but I was foreshadowing...

>> (II) We talked about the needs of a "simd" abi

... this, in which we would not necessarily know the vconfig.

> Must is a strong word, but I agree that we should at least ensure that it's
> possible to define a sane ABI that saves vector registers around function 
> calls
> and passes arguments via vector registers.  In other words: I think we'll 
> still
> want to support something like "-march=rv64gcv -mabi=lp64d", but I don't think
> we want to preclude ourselves from "-march=rv64gcv -mabi=lp64dv" being better.
> 
> I think the best way to go about this is to figure out what features of an ABI
> might be worth having, and then to enumerate the mechanisms that an V-style 
> ISA
> extension must provide in order to sanely implement such an ABI.  Essentially
> we've still got time to change the ISA, so let's just design a good ABI, 
> figure
> out what's necessary from the ISA to implement said ABI, and then make sure
> that's in the standard.

Sure.

> 
> The ABI features I can think of are:
> 
> * Passing at least one argument in a vector register.
>    - Presumably we'll clobber vector argument registers on calls, like we do
>      for everything else.  Thus there isn't any ISA requirement here.
>    - How does one go about indicating at the C level that an argument is     
> passed in a register?  If we just say "any __attribute__((vector)) of     
> length less than N bytes/elements" then N must be less than the ISA     
> mandated minimum vector length (IIRC 4 elements?) -- that might be OK.

Here I think you need to read the SVE document.

I would not use this abi for __attribute__((vector(fixed-size))) at all, but
for the variable length vectors that the auto-vectorizer uses, since that's
exactly what these functions are for.

> * Saving the contents of at least one vector register across a function call. 
>  In order to do so we need:
>    - A mechanism for determining the number of bytes used by a vector     
> register, to reserve stack space.
>    - A mechanism for saving a vector register to the stack.  This could be a
>      simple vector store, but if we want to maintain the entire register (as
>      opposed to just the first vl elements) we need

This is exactly what I was talking about above for vspill/vfill.

> * Saving vl across a function call.
>    - We need a mechanism for determining the vector length.  Currently the    
>  
> only way to do so is destructive, we'd need a non-destructive way to do      
> so.
> * Saving vconfig across a function call.
>    - There is no way to determine the config, we'd need a way to do so.

Correct.

I will note that the above addvsz can be used as "addvsz tmp, x0, 1" to extract
VSZ.  I can't think of how often extracting MAXEL and MAXVL individually would
be useful, so maybe just being able to get them from a read-vconfig insn would
be enough.

> My proposed vector ABI is:
> 
> * Don't pass any vector arguments in registers.

If you're going to do that why define a new ABI at all?

>> (II-a) The callee must know how many registers are enabled by vconfig.
>>
>> The simplest solution is simply to require all 32 registers to be enabled.
>>
>> Expanding on this slightly, one could require a reduced set N (e.g. 16)
>> and defined this as abi.  This would trade off potentially unused
>> registers and potentially more spilling for longer vectors in the
>> (presumably) common case.
>>
>> One could require N registers by default and override this by an
>> explicit target-specific clause in the #pragma.  This would allow
>> programmers to tune the compiler output (bearing in mind that changing
>> the clause changes the function abi), while also providing a sensible
>> default for code that has not been explicitly tuned for a given risc-v
>> implementation.
> 
> Makes sense -- my only worry here is that we're leaving a lot on the floor. 
> Maybe this is just because I'm not really a vector guy, but my biggest worry
> with the vector unit is ensuring that memcpy() and friends are reasonably
> efficient.

For memcpy, that's always going to be a normal abi, so it can legitimately
clobber all of the vector registers in any way it likes -- e.g. reconfig to
maximize byte vector length.


> I'm a bit worried about throwing a factor of 32 in vector length on
> the floor here (or requiring saving a huge vector state),

Jakub talked a bit about this in his reply.


> particularly as I
> think that most vectorized code won't need to worry about calling standard ABI
> functions.

Well, yes, most things that we can vectorize don't need this.
But loops that would use this ABI would otherwise be non-vectorizable.


r~

Reply via email to