wingo added a subscriber: tlively.
wingo added a comment.

In D108464#2957276 <https://reviews.llvm.org/D108464#2957276>, @wingo wrote:

> Sooooo... besides the refactor, this is getting closer to where I'm going in 
> https://lists.llvm.org/pipermail/cfe-dev/2021-July/068559.html, though still 
> NFC.  I think you can see where I would replace `getASTAllocaAddressSpace` 
> with `getAllocaAddressSpace(QualType Ty)`, and possibly (depending on the 
> source language) avoid casting the resulting alloca to `LangAS::Default`.  
> WDYT, is this sort of thing OK?

Taking this patch as perhaps a better generic discussion point, @rjmccall 
graciously gave some general feedback on this approach (thank you!!!):

In D108360#2957844 <https://reviews.llvm.org/D108360#2957844>, @rjmccall wrote:

> I'm not sure that I agree with your overall plan, though:
>
> - The WebAssembly operand stack is not a good match for an address space at 
> the language level because it's not addressable at all.  If you can't 
> meaningfully have a pointer into the address space, then you don't really 
> need this in the type system; it's more like a declaration modifier at best.
> - Allocating local variables on the operand stack ought to be a very 
> straightforward analysis in the backend.  There's not much optimization value 
> in trying to do it in the frontend, and it's going to be problematic for 
> things like coroutine lowering.
> - The security argument seems pretty weak, not because security isn't 
> important but because this is not really an adequate basis for getting the 
> tamper-proof guarantee you want.  For example, LLVM passes can and do 
> introduce its own allocas and store scalars into them sometimes.  Really you 
> need some sort of "tamper-proof" *type* which the compiler can make an 
> end-to-end guarantee of non-tamper-ability for the values of, and while 
> optimally this would be implemented by just keeping values on the operand 
> stack, in the general case you will need to have some sort of strategy for 
> keeping things in memory.

Thanks for thinking about this!  Indeed I started out with the goal of not 
going deep into clang and if it's possible to avoid going too deeply, that 
would be better for everyone involved.  I am starting to think however that it 
may be unavoidable for me at least.

So, I am focusing on WebAssembly global and local variables; the WebAssembly 
operand stack is an artifact of the IR-to-MC lowering and AFAICS doesn't have 
any bearing on what clang does -- though perhaps I am misunderstanding what you 
are getting at here.  The issue is not to allocate locals on the operand stack, 
but rather to allocate them as part of the "locals" of a WebAssembly function 
<https://webassembly.github.io/spec/core/syntax/modules.html#functions>.  Cc 
@tlively on the WebAssembly side.

I agree that the security argument is weak: it's something but it's not the 
real motivation.

The main motivator is the ability to have "reference type" 
<https://webassembly.github.io/spec/core/syntax/types.html#reference-types> 
(`externref`/`funcref`) locals and globals 
<https://webassembly.github.io/spec/core/syntax/modules.html#globals> at all.  
Reference-typed values can't be stored to linear memory.  They have no size and 
no byte representation -- they are opaque values from the host.  However, 
WebAssembly locals and globals can define storage locations of type `externref` 
or `funcref`.  The storage locations for WebAssembly locals and globals are not 
in linear memory, and are not addressable by pointer at run-time -- accesses to 
them are always by immediate.

Currently, clang always produces LLVM IR that allocates C++ globals and locals 
in linear memory.  LLVM may transform some of these to WebAssembly globals or 
locals at its discretion.  This strategy works because all values for the 
initial set of types supported by WebAssembly can be stored to linear memory; 
what you can do with a WebAssembly global or local was a subset of what you 
could do with linear-memory globals and alloca locals.

However, with reference types (merged into the spec earlier this year), this is 
no longer the case -- there are now types representable in WebAssembly 
globals/locals which can't be represented in linear memory.

Because of the limitations in how WebAssembly globals and locals can be used, 
reference-typed values have associated semantic restrictions in the front-end.  
If I declare a C++ local of type externref (which must be allocated to a 
WebAssembly local), I can't take its address:

  void f() {
    externref_t x = g();
    h(&x); // error
  }

Similarly I can't put an externref in an aggregate type that itself is 
allocated in linear memory:

  // global
  struct { int x; externref_t y; } z; // error

But, if we add a generic OpenCL-like address space attribute, that would allow 
the user to declare some variables to be in alternate address spaces.  Then we 
can apply the ACLE SVE semantic restrictions to these values also, and add on 
an additional restriction preventing address-of.  That way users get to make 
off-heap definitions, and if they misuse them, they get comprehensible errors.  
LLVM IR and WebAssembly lowering is ready for these alternate-address-space 
allocations.

  // global
  struct { int x; externref_t y; } z __attribute__((wasm_var)); // ok

The builtin `externref_t` and `funcref_t` types would probably already have 
this attribute.  (I don't have a complete clang patchset yet, so if you prefer 
to wait to see what things look like, this is perfectly ok.)  But because in 
the future there will be more kinds of reference types, and that we might want 
to have have aggregate types which include both number and reference types, it 
seems that there are two separable concerns here: one about applying the 
semantic restrictions for WebAssembly global and local storage locations, and 
another concern about handling "opaque" values (`externref`) which doesn't 
impose additional Sema/ restrictions.

The restrictions needed for WebAssembly globals and locals are essentially the 
same, and they lower to the same LLVM IR address space for the WebAssembly 
target, hence I would propose a single `wasm_var` attribute instead of 
`wasm_global` and `wasm_local`.  This can change though if it's confusing.

Finally, I would note that it would be useful from an ABI point of view to be 
able to define named WebAssembly globals (but not locals) in C, if e.g. an 
external interface expects that a module export an `i32` global with name 
`foo`.  So this patch-set has that use-case also.

Regarding coroutine lowering, I can see how that can be challenging; would it 
be reasonable to restrict continuations to not include saved off-heap locals, 
for now?  If there were such a local, it would be a compilation error.

OK, lots of words.  Thanks for reading.  What do you think about this (ab)use 
of LangAS?  If there is a better way to cross reference types with C++, 
pointers are very much welcome.  I will have a better idea what the end size of 
the patch-set is within a couple weeks; I guess I would propose to continue 
posting the series and hope that the end set of core changes is stomache-able, 
and add you to Cc as I go.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108464/new/

https://reviews.llvm.org/D108464

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to