tra added inline comments.

================
Comment at: clang/test/CodeGenCUDA/lambda.cu:16-26
+template<class F>
+__global__ void g(F f) { f(); }
+
+template<class F>
+void h(F f) { g<<<1,1>>>(f); }
+
+__device__ int a;
----------------
yaxunl wrote:
> tra wrote:
> > The test example may not be doing what it's seemingly supposed to be doing:
> > https://cppinsights.io/s/3a5c42ff
> > 
> > `h()` gets a temporary host-side object which keeps the reference to `a` 
> > and that reference will actually point to the host-side shadow of the 
> > actual device-side `a`. When you get to execute `g` it's `this` may not be 
> > very usable on device side and thus `f.operator()` will probably not work.
> > 
> > Alas, we currently have no diagnostics for that kind of error.
> > 
> > Change it to a non-capturing lambda, perhaps?
> It works.
> 
> We need to think about this in device compilation. In device compilation, 
> global variable is a device variable, the lambda is a device host function, 
> therefore the lambda is accessing the real a, not the shadow.
> 
> In the host compilation, the lambda is not really called, therefore it is not 
> emitted.
> 
> I will update the lit test with these checks.
Clang manages to see through to the initializer of `a`, but I'm not sure how 
much we can rely on this.
In general, `f.operator()` for a capturing lambda needs to access captured 
variables via `this` which points to a temporary objects created and passed to 
`g` by the host. You can see it if you capture a local variable: 
https://godbolt.org/z/99389o

Anyways, it's an issue orthogonal to this patch. My concern is that tests are 
often used as an example of things that are OK to do, and capturing lambdas are 
a pretty big foot-shooting gun when used with CUDA. It's very easy to do wrong 
thing without compiler complaining about them.
While accessing `a` does work, it appears to do so by accident, rather than by 
design. 

I'm fairly confident that I can hide the initializer with sufficiently 
complicated code, force clang to access `a` via `this` and make everything fail 
at runtime. IMO, what we have here is a 'happens to work' situation. I do not 
want to call it 'guaranteed to work' without making sure that it always does.

In order to demonstrate that lambda is host/device, you do not need it to be a 
capturing lambda. You can make it call an overloaded function with host and 
device variants and verify that the lambda works on host and device sides.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78655/new/

https://reviews.llvm.org/D78655



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to