Re: RISC-V vector extension cauldron discussion

Palmer Dabbelt Fri, 28 Sep 2018 18:46:04 -0700

On Tue, 11 Sep 2018 14:34:24 PDT (-0700), r...@twiddle.net wrote:

On 09/11/2018 09:28 AM, Palmer Dabbelt wrote:

The RISC-V vector extension described something other than what is
present in the currently released 2.2 standard.  To clarify the
language within this message, based on what I remember:


Yes.  The current RISC-V ISA standard contains no vector instructions, they
will be added under the "V" extension as part of a future revision of the
RISC-V standard.  This is how we manage the standard: as new revisions of the
ISA manual come out we can add new extensions, but we can never change or
remove an existing extension.


Well, right, but it does have a draft of the V extension.
What was presented did not match that, which is what I was trying to describe.

Ah, I didn't know that. I guess I should look at our ISA manual more often...:) Regardless, the presented extensions have drifted from v2.2 in various wayssuch that I'm no longer sure what is real any more.

We posited new instructions, vspill and vfill, that ignore VL, ignore
predication, and operate on all MAXVL elements of MAXEL.  This allows
the compiler to save and restore the entire contents of the register
without knowing the current configuration.


While I'm not part of the vector working group, I'd anticipate these sorts of
instructions don't make it into the V extension because they leak too much
about the microarchitecture to software.  One of the goals of the V extension
is to allow for software compatibility between different implementations, and
instructions with semantics like these tend to lead to incompatible software.


Pardon?  How do they leak micro-architecture detail?
They load and store the *architectural* contents of the registers.

Ya, I think I might have been wrong here. When I actually went through this Ithink there might be a way to implement these with reasonable performance andwithout leaking any micro-architectural state.

My worry here was exactly how this whole "ignore VL" idea interplays with whatvalues are allowed to exist in registers, which keeps flopping back and forthbased on how much type support we're baking into the base ISA and who yellsloudest in the meetings. IIRC the current proposal is to have something like


  getvl t0 <- vl [imaginary instruction so we can restore vl]
  setvl 0
  vxor v0, v0
  setvl t0

be defined to change no V state, in which case I think it would be possible todefine a sane version of these instructions. I also think that a reasonableclass of vector ABIs might rely on these, so we should probably figure out howto make them work.

Additionally, I don't think this is necessary because our proposed vector ABI
is to clobber the entire state of the vector unit on all function calls.


Yes, but I was foreshadowing...

(II) We talked about the needs of a "simd" abi


... this, in which we would not necessarily know the vconfig.


I think we start agreeing in a hundred lines or so... :)

Must is a strong word, but I agree that we should at least ensure that it's
possible to define a sane ABI that saves vector registers around function calls
and passes arguments via vector registers.  In other words: I think we'll still
want to support something like "-march=rv64gcv -mabi=lp64d", but I don't think
we want to preclude ourselves from "-march=rv64gcv -mabi=lp64dv" being better.

I think the best way to go about this is to figure out what features of an ABI
might be worth having, and then to enumerate the mechanisms that an V-style ISA
extension must provide in order to sanely implement such an ABI.  Essentially
we've still got time to change the ISA, so let's just design a good ABI, figure
out what's necessary from the ISA to implement said ABI, and then make sure
that's in the standard.


Sure.


The ABI features I can think of are:

* Passing at least one argument in a vector register.
   - Presumably we'll clobber vector argument registers on calls, like we do
     for everything else.  Thus there isn't any ISA requirement here.
   - How does one go about indicating at the C level that an argument is     
passed in a register?  If we just say "any __attribute__((vector)) of     
length less than N bytes/elements" then N must be less than the ISA     
mandated minimum vector length (IIRC 4 elements?) -- that might be OK.


Here I think you need to read the SVE document.

Yes, I agree that I should read the SVE document. In fact, I opened it andscanned through it and though "gee, I should really read this". By the time Igot through the rest of your email the cauldron was almost over, and I figuredI should send something.

I would not use this abi for __attribute__((vector(fixed-size))) at all, but
for the variable length vectors that the auto-vectorizer uses, since that's
exactly what these functions are for.

* Saving the contents of at least one vector register across a function call. 
 In order to do so we need:
   - A mechanism for determining the number of bytes used by a vector     
register, to reserve stack space.
   - A mechanism for saving a vector register to the stack.  This could be a
     simple vector store, but if we want to maintain the entire register (as
     opposed to just the first vl elements) we need


This is exactly what I was talking about above for vspill/vfill.

Yes, and I think that by this point I'd already convinced myself your wereright. Sorry for being somewhat incoherent, I wrote my response at about oneline per hour because I learned a lot while doing so.

* Saving vl across a function call.
   - We need a mechanism for determining the vector length.  Currently the     
only way to do so is destructive, we'd need a non-destructive way to do      so.
* Saving vconfig across a function call.
   - There is no way to determine the config, we'd need a way to do so.


Correct.

I will note that the above addvsz can be used as "addvsz tmp, x0, 1" to extract
VSZ.  I can't think of how often extracting MAXEL and MAXVL individually would
be useful, so maybe just being able to get them from a read-vconfig insn would
be enough.

Yes. I was trying to avoid explicitly describing an instruction encoding ofwhat was necessary, as once we get into encodings we'll get painted into acorner eventually. I generally like try to figure out what information isnecessary, and how fast we might need to obtain that information, before tryingto pack it into an encoding.

My proposed vector ABI is:

* Don't pass any vector arguments in registers.


If you're going to do that why define a new ABI at all?

Ya, I think I might have gone too far down the rabbit hole of "let's not definea new ABI". My biggest concern here is how the ABI maps to user code, which iswhere I really need to read for a bit.

(II-a) The callee must know how many registers are enabled by vconfig.

The simplest solution is simply to require all 32 registers to be enabled.

Expanding on this slightly, one could require a reduced set N (e.g. 16)
and defined this as abi.  This would trade off potentially unused
registers and potentially more spilling for longer vectors in the
(presumably) common case.

One could require N registers by default and override this by an
explicit target-specific clause in the #pragma.  This would allow
programmers to tune the compiler output (bearing in mind that changing
the clause changes the function abi), while also providing a sensible
default for code that has not been explicitly tuned for a given risc-v
implementation.


Makes sense -- my only worry here is that we're leaving a lot on the floor. 
Maybe this is just because I'm not really a vector guy, but my biggest worry
with the vector unit is ensuring that memcpy() and friends are reasonably
efficient.


For memcpy, that's always going to be a normal abi, so it can legitimately
clobber all of the vector registers in any way it likes -- e.g. reconfig to
maximize byte vector length.

I'm a bit worried about throwing a factor of 32 in vector length on
the floor here (or requiring saving a huge vector state),


Jakub talked a bit about this in his reply.

Yes, I saw this. My worry here is that these sort of things are in the realmof the microarchitecture leaking into the ABI, and I'd really like to avoidthat. It might be a bit of a pipe dream, but I'd still like to give it a shot-- I'm really trying to avoid a huge ABI explosion in RISC-V land, as that's amess.

particularly as I
think that most vectorized code won't need to worry about calling standard ABI
functions.


Well, yes, most things that we can vectorize don't need this.
But loops that would use this ABI would otherwise be non-vectorizable.

Can we actually just sit down with everyone and talk about this at some point?We're doing RISC-V things at Plumbers, ELC-E, and FOSDEM (as well as the RISC-VSummit). I feel like doing this over email is going to be inefficient, largelybecause I'm just going to be stuck making stupid responses for a while since Ireally don't know what I'm doing here.


Thanks for spending so much time on this!

Re: RISC-V vector extension cauldron discussion

Reply via email to